EDU 2: to benefit from the non-linearity of corpus-wide statistics for part-of-speech ( pos ) tagging .
EDU 3: we investigated several types of corpus-wide information for the words , such as word embeddings and pos tag distributions .
EDU 4: since these statistics are encoded as dense continuous features ,
EDU 5: it is not trivial to combine these features
EDU 6: comparing with sparse discrete features .
EDU 7: our tagger is designed as a combination of a linear model for discrete features and a feed-forward neural network
EDU 8: that captures the non-linear interactions among the continuous features .
EDU 9: by using several recent advances in the activation functions for neural networks ,
EDU 10: the proposed method marks new state-of-the-art accuracies for english pos tagging tasks .
EDU 0:
EDU 1: different approaches to high-quality grammatical error correction have been proposed recently ,
EDU 2: many of which have their own strengths and weaknesses .
EDU 3: most of these approaches are based on classification or statistical machine translation ( smt ) .
EDU 4: in this paper , we propose to combine the output from a classification-based system and an smt-based system
EDU 5: to improve the correction quality .
EDU 6: we adopt the system combination technique of heafield and lavie ( 0000 ) .
EDU 7: we achieve an f0.0 score of 00.00 % on the test set of the conll-0000 shared task ,
EDU 8: outperforming the best system in the shared task .
EDU 0:
EDU 1: in this paper we propose a method
EDU 2: to increase dependency parser performance
EDU 3: without using additional labeled or unlabeled data
EDU 4: by refining the layer of predicted part-of-speech ( pos ) tags .
EDU 5: we perform experiments on english and german
EDU 6: and show significant improvements for both languages .
EDU 7: the refinement is based on generative split-merge training for hidden markov models ( hmms ) .
EDU 0:
EDU 1: importance weighting is a generalization of various statistical bias correction techniques .
EDU 2: while our labeled data in nlp is heavily biased ,
EDU 3: importance weighting has seen only few applications in nlp ,
EDU 4: most of them relying on a small amount of labeled target data .
EDU 5: the publication bias
EDU 6: toward reporting positive results
EDU 7: makes it hard to say whether researchers have tried .
EDU 8: this paper presents a negative result on unsupervised domain adaptation for pos tagging .
EDU 9: in this setup , we only have unlabeled data
EDU 10: and thus only indirect access to the bias in emission and transition probabilities .
EDU 11: moreover , most errors in pos tagging are due to unseen words ,
EDU 12: and there , importance weighting cannot help .
EDU 13: we present experiments with a wide variety of weight functions , quantilizations , as well as with randomly generated weights ,
EDU 14: to support these claims .
EDU 0:
EDU 1: code-mixing is frequently observed in user generated content on social media , especially from multilingual users .
EDU 2: the linguistic complexity of such content is compounded by presence of spelling variations , transliteration and non-adherance to formal grammar .
EDU 3: we describe our initial efforts
EDU 4: to create a multi-level annotated corpus of hindi-english code-mixed text
EDU 5: collated from facebook forums ,
EDU 6: and explore language identification , back-transliteration , normalization and pos tagging of this data .
EDU 7: our results show
EDU 8: that language identification and transliteration for hindi are two major challenges
EDU 9: that impact pos tagging accuracy .
EDU 0:
EDU 1: we investigate grammatical error detection in spoken language ,
EDU 2: and present a data-driven method
EDU 3: to train a dependency parser
EDU 4: to automatically identify and label grammatical errors .
EDU 5: this method is agnostic to the label set used ,
EDU 6: and the only manual annotations
EDU 7: needed for training
EDU 8: are grammatical error labels .
EDU 9: we find
EDU 10: that the proposed system is robust to disfluencies ,
EDU 11: so that a separate stage to elide disfluencies is not required .
EDU 12: the proposed system outperforms two baseline systems on two different corpora
EDU 13: that use different sets of error tags .
EDU 14: it is able to identify utterances with grammatical errors with an f0-score as high as 0.000 ,
EDU 15: as compared to a baseline f0 of 0.000 on the same data .
EDU 0:
EDU 1: we introduce a new ccg parsing model
EDU 2: which is factored on lexical category assignments .
EDU 3: parsing is then simply a deterministic search for the most probable category sequence
EDU 4: that supports a ccg derivation .
EDU 5: the parser is extremely simple , with a tiny feature set , no pos tagger , and no statistical model of the derivation or dependencies .
EDU 6: formulating the model in this way allows a highly effective heuristic for a∗parsing ,
EDU 7: which makes parsing extremely fast .
EDU 8: compared to the standard c&c ccg parser ,
EDU 9: our model is more accurate out-of-domain ,
EDU 10: is four times faster ,
EDU 11: has higher coverage ,
EDU 12: and is greatly simplified .
EDU 13: we also show
EDU 14: that using our parser improves the performance of a state-of-the-art question answering system .
EDU 0:
EDU 1: we describe a new dependency parser for english tweets , tweeboparser .
EDU 2: the parser builds on several contributions :
EDU 3: new syntactic annotations for a corpus of tweets ( tweebank ) , with conventions
EDU 4: informed by the domain ;
EDU 5: adaptations to a statistical parsing algorithm ;
EDU 6: and a new approach to exploiting out-of-domain penn treebank data .
EDU 7: our experiments show
EDU 8: that the parser achieves over 00 % unlabeled attachment accuracy on our new , high-quality test set
EDU 9: and measure the benefit of our contributions .
EDU 10: our dataset and parser can be found at http : //www.ark.cs.cmu.edu/tweetnlp .
EDU 0:
EDU 1: dependency parsing with high-order features results in a provably hard decoding problem .
EDU 2: a lot of work has gone into developing powerful optimization methods
EDU 3: for solving these combinatorial problems .
EDU 4: in contrast , we explore , analyze , and demonstrate
EDU 5: that a substantially simpler randomized greedy inference algorithm already suffices for near optimal parsing :
EDU 6: a) we analytically quantify the number of local optima
EDU 7: that the greedy method has to overcome in the context of first-order parsing ;
EDU 8: b) we show
EDU 9: that , as a decoding algorithm , the greedy method surpasses dual decomposition in second-order parsing ;
EDU 10: c) we empirically demonstrate
EDU 11: that our approach with up to third-order and global features outperforms the state-of-the-art dual decomposition and mcmc sampling methods
EDU 12: when evaluated on 00 languages of non-projective conll datasets .
EDU 0:
EDU 1: most word representation methods assume
EDU 2: that each word owns a single semantic vector .
EDU 3: this is usually problematic
EDU 4: because lexical ambiguity is ubiquitous ,
EDU 5: which is also the problem
EDU 6: to be resolved by word sense disambiguation .
EDU 7: in this paper , we present a unified model for joint word sense representation and disambiguation ,
EDU 8: which will assign distinct representations for each word sense .
EDU 9: the basic idea is that both word sense representation ( wsr ) and word sense disambiguation ( wsd ) will benefit from each other :
EDU 10: ( 0 ) high-quality wsr will capture rich information about words and senses ,
EDU 11: which should be helpful for wsd ,
EDU 12: and ( 0 ) high-quality wsd will provide reliable disambiguated corpora
EDU 13: for learning better sense representations .
EDU 14: experimental results show
EDU 15: that , our model improves the performance of contextual word similarity
EDU 16: compared to existing wsr methods ,
EDU 17: outperforms state-of-the-art supervised methods on domain-specific wsd ,
EDU 18: and achieves competitive performance on coarse-grained all-words wsd .
EDU 0:
EDU 1: compositional distributional semantics is a subfield of computational linguistics
EDU 2: which investigates methods
EDU 3: for representing the meanings of phrases and sentences .
EDU 4: in this paper , we explore implementations of a framework
EDU 5: based on combinatory categorial grammar ( ccg ) ,
EDU 6: in which words with certain grammatical types have meanings
EDU 7: represented by multi-linear maps
EDU 8: ( i.e. multi-dimensional arrays , or tensors ) .
EDU 9: an obstacle to full implemen-tation of the framework is the size of these tensors .
EDU 10: we examine the performance of lower dimensional approximations of transitive verb tensors on a sentence plausibility/selectional preference task .
EDU 11: we find
EDU 12: that the matrices perform as well as , and sometimes even better than , full tensors ,
EDU 13: allowing a reduction in the number of parameters
EDU 14: needed to model the framework .
EDU 0:
EDU 1: in this paper we propose a computational method
EDU 2: for determining the orthographic similarity between romanian and related languages .
EDU 3: we account for etymons and cognates
EDU 4: and we investigate not only the number of related words , but also their forms ,
EDU 5: quantifying orthographic similarities .
EDU 6: the method we propose is adaptable to any language ,
EDU 7: as far as resources are available .
EDU 0:
EDU 1: there is rising interest in vector-space word embeddings and their use in nlp ,
EDU 2: especially given recent methods for their fast estimation at very large scale .
EDU 3: nearly all this work , however , assumes a single vector per word type—ignoring polysemy
EDU 4: and thus jeopardizing their useful-ness for downstream tasks .
EDU 5: we present an extension to the skip-gram model
EDU 6: that efficiently learns multiple embeddings per word type .
EDU 7: it differs from recent related work
EDU 8: by jointly performing word sense discrimination and embedding learning ,
EDU 9: by non-parametrically estimating the number of senses per word type , and by its efficiency and scalability .
EDU 10: we present new state-of-the-art results in the word similarity in context task
EDU 11: and demonstrate its scalability
EDU 12: by training with one machine on a corpus of nearly 0 billion tokens in less than 0 hours .
EDU 0:
EDU 1: knowledge graphs are recently used
EDU 2: for enriching query representations in an entity-aware way for the rich facts
EDU 3: organized around entities in it .
EDU 4: however , few of the methods pay attention to non-entity words
EDU 5: and clicked websites in queries ,
EDU 6: which also help conveying user intent .
EDU 7: in this paper , we tackle the problem of intent understanding with
EDU 8: innovatively representing entity words , refiners and clicked urls as intent topics in a unified knowledge graph based framework , in a way
EDU 9: to exploit and expand knowledge graph
EDU 10: which we call "tailor" .
EDU 11: we collaboratively exploit global knowledge in knowledge graphs and local contexts in query log
EDU 12: to initialize intent representation ,
EDU 13: then propagate the enriched features in a graph
EDU 14: consisting of intent topics
EDU 15: using an unsupervised algorithm .
EDU 16: the experiments prove intent topics with knowledge graph enriched features significantly enhance intent understanding .
EDU 0:
EDU 1: the role of web search queries has been demonstrated in the extraction of attributes of instances and classes , or of sets of related instances and their class labels .
EDU 2: this paper explores the acquisition of open-domain commonsense knowledge , usually available as factual knowledge , from web search queries .
EDU 3: similarly to previous work in open-domain information extraction ,
EDU 4: knowledge extracted from text - in this case , from queries - takes the form of lexicalized assertions
EDU 5: associated with open-domain classes .
EDU 6: experimental results indicate
EDU 7: that facts
EDU 8: extracted from queries complement ,
EDU 9: and have competitive accuracy levels relative to , facts
EDU 10: extracted from web documents by previous methods .
EDU 0:
EDU 1: question answering over linked data ( qald ) aims to evaluate a question answering system over structured data ,
EDU 2: the key objective of which is to translate questions
EDU 3: posed using natural language
EDU 4: into structured queries .
EDU 5: this technique can help common users to directly access open-structured knowledge on the web
EDU 6: and , accordingly , has attracted much attention .
EDU 7: to this end , we propose a novel method
EDU 8: using first-order logic .
EDU 9: we formulate the knowledge
EDU 10: for resolving the ambiguities in the main three steps of qald
EDU 11: ( phrase detection , phrase-to-semantic-item mapping and semantic item grouping )
EDU 12: as first-order logic clauses in a markov logic network .
EDU 13: all clauses can then produce interacted effects in a unified framework
EDU 14: and can jointly resolve all ambiguities .
EDU 15: moreover , our method adopts a pattern-learning strategy for semantic item grouping .
EDU 16: in this way , our method can cover more text expressions
EDU 17: and answer more questions than previous methods
EDU 18: using manually designed patterns .
EDU 19: the experimental results
EDU 20: using open benchmarks
EDU 21: demonstrate the effectiveness of the proposed method .
EDU 0:
EDU 1: much recent work focuses on formal interpretation of natural question utterances ,
EDU 2: with the goal of executing the resulting structured queries on knowledge graphs ( kgs ) such as freebase .
EDU 3: here we address two limitations of this approach
EDU 4: when applied to open-domain , entity-oriented web queries .
EDU 5: first , web queries are rarely well-formed questions .
EDU 6: they are "telegraphic" , with missing verbs , prepositions , clauses , case and phrase clues .
EDU 7: second , the kg is always incomplete ,
EDU 8: unable to directly answer many queries .
EDU 9: we propose a novel technique
EDU 10: to segment a telegraphic query
EDU 11: and assign a coarse-grained purpose to each segment :
EDU 12: a base entity e0 , a relation type r , a target entity type t0 , and contextual words s .
EDU 13: the query seeks entity e0 ∈ t0
EDU 14: where r ( e0 , e0 ) holds ,
EDU 15: further evidenced by schema-agnostic words s .
EDU 16: query segmentation is integrated with the kg and an unstructured corpus
EDU 17: where mentions of entities have been linked to the kg .
EDU 18: we do not trust the best or any specific query segmentation .
EDU 19: instead , evidence in favor of candidate e0s are aggregated across several segmentations .
EDU 20: extensive experiments on the clueweb corpus and parts of freebase as our kg ,
EDU 21: using over a thousand telegraphic queries
EDU 22: adapted from trec , inex , and webquestions ,
EDU 23: show the efficacy of our approach .
EDU 24: for one benchmark , map improves from 0.0-0.00 ( competitive baselines ) to 0.00 ( our system ) .
EDU 25: ndcg @ 00 improves from 0.00-0.00 to 0.00 .
EDU 0:
EDU 1: estimating questions' difficulty levels is an important task in community question answering ( cqa ) services .
EDU 2: previous studies propose to solve this problem based on the question-user comparisons
EDU 3: extracted from the question answering threads .
EDU 4: however , they suffer from data sparseness problem
EDU 5: as each question only gets a limited number of comparisons .
EDU 6: moreover , they cannot handle newly posted questions
EDU 7: which get no comparisons .
EDU 8: in this paper , we propose a novel question difficulty estimation approach
EDU 9: called regularized competition model ( rcm ) ,
EDU 10: which naturally combines question-user comparisons and questions' textual descriptions into a unified framework .
EDU 11: by incorporating textual information ,
EDU 12: rcm can effectively deal with data sparseness problem .
EDU 13: we further employ a k-nearest neighbor approach
EDU 14: to estimate difficulty levels of newly posted questions ,
EDU 15: again by leveraging textual similarities .
EDU 16: experiments on two publicly available data sets show
EDU 17: that for both well-resolved and newly-posted questions , rcm performs the estimation task significantly better than existing methods ,
EDU 18: demonstrating the advantage of incorporating textual information .
EDU 19: more interestingly , we observe
EDU 20: that rcm might provide an automatic way
EDU 21: to quantitatively measure the knowledge levels of words .
EDU 0:
EDU 1: a poll consists of a question and a set of predefined answers
EDU 2: from which voters can select .
EDU 3: we present the new problem of vote prediction on comments ,
EDU 4: which involves determining which of these answers a voter selected
EDU 5: given a comment
EDU 6: she wrote
EDU 7: after voting .
EDU 8: to address this task ,
EDU 9: we exploit not only the information
EDU 10: extracted from the comments
EDU 11: but also extra-textual information such as user demographic information and inter-comment constraints .
EDU 12: in an evaluation
EDU 13: involving nearly one million comments
EDU 14: collected from the popular sodahead social polling website ,
EDU 15: we show
EDU 16: that a vote prediction system
EDU 17: that exploits only textual information
EDU 18: can be improved significantly
EDU 19: when extended with extra-textual information .
EDU 0:
EDU 1: in this paper we first exploit cash-tags ( " $ "
EDU 2: followed by stocks' ticker symbols )
EDU 3: in twitter
EDU 4: to build a stock network ,
EDU 5: where nodes are stocks
EDU 6: connected by edges
EDU 7: when two stocks co-occur frequently in tweets .
EDU 8: we then employ a labeled topic model
EDU 9: to jointly model both the tweets and the network structure
EDU 10: to assign each node and each edge a topic respectively .
EDU 11: this semantic stock network ( ssn ) summarizes discussion topics about stocks and stock relations .
EDU 12: we further show
EDU 13: that social sentiment about stock ( node ) topics and stock relationship ( edge ) topics are predictive of each stock's market .
EDU 14: for prediction , we propose to regress the topic-sentiment time-series and the stock's price time series .
EDU 15: experimental results demonstrate
EDU 16: that topic sentiments from close neighbors are able to help improve the prediction of a stock markedly .
EDU 0:
EDU 1: demographic lexica have potential for widespread use in social science , economic , and business applications .
EDU 2: we derive predictive lexica
EDU 3: ( words and weights )
EDU 4: for age and gender
EDU 5: using regression and classification models from word usage in facebook , blog , and twitter data with associated demographic labels .
EDU 6: the lexica ,
EDU 7: made publicly available,
EDU 8: achieved state-of-the-art accuracy in language based age and gender prediction over facebook and twitter ,
EDU 9: and were evaluated for generalization across social media genres as well as in limited message situations .
EDU 0:
EDU 1: dependency parsing is a core task in nlp ,
EDU 2: and it is widely used by many applications such as information extraction , ques-tion answering , and machine translation .
EDU 3: in the era of social media , a big challenge is that parsers
EDU 4: trained on traditional newswire corpora
EDU 5: typically suffer from the domain mismatch issue ,
EDU 6: and thus perform poorly on social media data .
EDU 7: we present a new gfl/fudg-annotated chinese treebank with more than 00k tokens from sina weibo
EDU 8: ( the chinese equivalent of twitter ) .
EDU 9: we formulate the dependency parsing problem as many small and parallelizable arc prediction tasks :
EDU 10: for each task , we use a programmable probabilistic first-order logic
EDU 11: to infer the dependency arc of a token in the sentence .
EDU 12: in experiments , we show
EDU 13: that the proposed model outperforms an off-the-shelf stanford chinese parser , as well as a strong maltparser baseline
EDU 14: that is trained on the same in-domain data .
EDU 0:
EDU 1: microblog has become a major platform for information about real-world events .
EDU 2: automatically discovering real-world events from microblog has attracted the attention of many researchers .
EDU 3: however , most of existing work ignore the importance of emotion information for event detection .
EDU 4: we argue
EDU 5: that people's emotional reactions immediately reflect the occurring of real-world events
EDU 6: and should be important for event detection .
EDU 7: in this study , we focus on the problem of community-related event detection by community emotions .
EDU 8: to address the problem ,
EDU 9: we propose a novel framework
EDU 10: which include the following three key components :
EDU 11: microblog emotion classification , community emotion aggregation and community emotion burst detection .
EDU 12: we evaluate our approach on real microblog data sets .
EDU 13: experimental results demonstrate the effectiveness of the proposed framework .
EDU 0:
EDU 1: casual online forums such as reddit , slashdot and digg , are continuing to increase in popularity as a means of communication .
EDU 2: detecting disagreement in this domain is a considerable challenge .
EDU 3: many topics are unique to the conversation on the forum ,
EDU 4: and the appearance of disagreement may be much more subtle than on political blogs or social media sites such as twitter .
EDU 5: in this analysis we present a crowd-sourced annotated corpus for topic level disagreement detection in slashdot ,
EDU 6: showing
EDU 7: that disagreement detection in this domain is difficult even for humans .
EDU 8: we then proceed to show
EDU 9: that a new set of features
EDU 10: determined from the rhetorical structure of the conversation
EDU 11: significantly improves the performance on disagreement detection over a baseline
EDU 12: consisting of unigram/bigram features , discourse markers , structural features and meta-post features .
EDU 0:
EDU 1: recently , work in nlp was initiated on a type of opinion inference
EDU 2: that arises
EDU 3: when opinions are expressed toward events
EDU 4: which have positive or negative effects on entities
EDU 5: ( +/-effect events ) .
EDU 6: this paper addresses methods
EDU 7: for creating a lexicon of such events ,
EDU 8: to support such work on opinion inference .
EDU 9: due to significant sense ambiguity ,
EDU 10: our goal is to develop a sense-level rather than word-level lexicon .
EDU 11: to maximize the effectiveness of different types of information ,
EDU 12: we combine a graph-based method
EDU 13: using wordnet0 relations
EDU 14: and a standard classifier
EDU 15: using gloss information .
EDU 16: a hybrid between the two gives the best results .
EDU 17: further , we provide evidence
EDU 18: that the model is an effective way
EDU 19: to guide manual annotation
EDU 20: to find +/-effect senses
EDU 21: that are not in the seed set .
EDU 0:
EDU 1: aspect-based opinion mining has attracted lots of attention today .
EDU 2: in this paper , we address the problem of product aspect rating prediction ,
EDU 3: where we would like to extract the product aspects ,
EDU 4: and predict aspect ratings simultaneously .
EDU 5: topic models have been widely adapted
EDU 6: to jointly model aspects and sentiments ,
EDU 7: but existing models may not do the prediction task well
EDU 8: due to their weakness in sentiment extraction .
EDU 9: the sentiment topics usually do not have clear correspondence to commonly used ratings ,
EDU 10: and the model may fail to extract certain kinds of sentiments
EDU 11: due to skewed data .
EDU 12: to tackle this problem ,
EDU 13: we propose a sentiment-aligned topic model ( satm ) ,
EDU 14: where we incorporate two types of external knowledge :
EDU 15: product-level overall rating distribution and word-level sentiment lexicon .
EDU 16: experiments on real dataset demonstrate
EDU 17: that satm is effective on product aspect rating prediction ,
EDU 18: and it achieves better performance
EDU 19: compared to the existing approaches .
EDU 0:
EDU 1: we present a weakly supervised approach
EDU 2: for learning hashtags , hashtag patterns , and phrases
EDU 3: associated with five emotions :
EDU 4: affection , anger/rage , fear/anxiety , joy , and sadness/disappointment .
EDU 5: starting with seed hashtags
EDU 6: to label an initial set of tweets ,
EDU 7: we train emotion classifiers
EDU 8: and use them
EDU 9: to learn new emotion hashtags and hashtag patterns .
EDU 10: this process then repeats in a bootstrapping framework .
EDU 11: emotion phrases are also extracted from the learned hashtags
EDU 12: and used to create phrase-based emotion classifiers .
EDU 13: we show
EDU 14: that the learned set of emotion indicators yields a substantial improve-ment in f-scores ,
EDU 15: ranging from + % 0 to + % 00 over baseline classifiers .
EDU 0:
EDU 1: we put forward the hypothesis
EDU 2: that high-accuracy sentiment analysis is only possible
EDU 3: if word senses with different polarity are accurately recognized .
EDU 4: we provide evidence for this hypothesis in a case study for the adjective "hard"
EDU 5: and propose contextually enhanced sentiment lexicons
EDU 6: that contain the information necessary for sentiment-relevant sense disambiguation .
EDU 7: an experimental evaluation demonstrates
EDU 8: that senses with different polarity can be distinguished well
EDU 9: using a combination of standard and novel features .
EDU 0:
EDU 1: identifying parallel web pages from bilingual web sites is a crucial step of bilingual resource construction for cross-lingual information processing .
EDU 2: in this paper , we propose a link-based approach
EDU 3: to distinguish parallel web pages from bilingual web sites .
EDU 4: compared with the existing methods ,
EDU 5: which only employ the internal translation similarity
EDU 6: ( such as content-based similarity and page structural similarity ) ,
EDU 7: we hypothesize
EDU 8: that the external translation similarity is an effective feature
EDU 9: to identify parallel web pages .
EDU 10: within a bilingual web site , web pages are interconnected by hyperlinks .
EDU 11: the basic idea of our method is that the translation similarity of two pages can be inferred from their neighbor pages ,
EDU 12: which can be adopted as an important source of external similarity .
EDU 13: thus , the translation similarity of page pairs will influence each other .
EDU 14: an iterative algorithm is developed
EDU 15: to estimate the external translation similarity and the final translation similarity .
EDU 16: both internal and external similarity measures are combined in the iterative algorithm .
EDU 17: experiments on six bilingual websites demonstrate
EDU 18: that our method is effective
EDU 19: and obtains significant improvement ( 0.0 % f-score ) over the baseline
EDU 20: which only utilizes internal translation similarity .
EDU 0:
EDU 1: analyses of computer aided translation typically focus on either frontend interfaces and human effort , or backend translation and machine learnability of corrections .
EDU 2: however , this distinction is artificial in practice
EDU 3: since the frontend and backend must work in concert .
EDU 4: we present the first holistic , quantitative evaluation of these issues
EDU 5: by contrasting two assistive modes :
EDU 6: post-editing and interactive machine translation ( mt ) .
EDU 7: we describe a new translator interface , extensive modifications to a phrase-based mt system , and a novel objective function
EDU 8: for re-tuning to human corrections .
EDU 9: evaluation with professional bilingual translators shows
EDU 10: that post-edit is faster than interactive at the cost of translation quality for french-english and english-german .
EDU 11: however , re-tuning the mt system to interactive output leads to larger , statistically significant reductions in hter
EDU 12: versus re-tuning to post-edit .
EDU 13: analysis shows
EDU 14: that tuning directly to hter results in fine-grained corrections to subsequent machine output .
EDU 0:
EDU 1: the combinatorial space of translation derivations in phrase-based statistical machine translation is given by the intersection between a translation lattice and a target language model .
EDU 2: we replace this in-tractable intersection by a tractable relaxation
EDU 3: which incorporates a low-order upperbound on the language model .
EDU 4: exact optimisation is achieved through a coarse-to-fine strategy with connections to adaptive rejection sampling .
EDU 5: we perform exact optimisation with unpruned language models of order 0 to 0
EDU 6: and show search-error curves for beam search and cube pruning on standard test sets .
EDU 7: this is the first work
EDU 8: to tractably tackle exact optimisation with language models of orders higher than 0 .
EDU 0:
EDU 1: recent work by cherry ( 0000 ) has shown
EDU 2: that directly optimizing phrase-based reordering models towards bleu can lead to significant gains .
EDU 3: their approach is limited to small training sets of a few thousand sentences and a similar number of sparse features .
EDU 4: we show
EDU 5: how the expected bleu objective allows us to train a simple linear discriminative reordering model with millions of sparse features on hundreds of thousands of sentences
EDU 6: resulting in significant improvements .
EDU 7: a comparison to likelihood training demonstrates
EDU 8: that expected bleu is vastly more effective .
EDU 9: our best results improve a hierarchical lexicalized reordering baseline by up to 0.0 bleu in a single-reference setting on a french-english wmt 0000 setup .
EDU 0:
EDU 1: numerous works in statistical machine translation ( smt ) have attempted to identify better translation hypotheses
EDU 2: obtained by an initial decoding
EDU 3: using an improved , but more costly scoring function .
EDU 4: in this work , we introduce an approach
EDU 5: that takes the hypotheses
EDU 6: produced by a state-of-the-art , reranked phrase-based smt system ,
EDU 7: and explores new parts of the search space
EDU 8: by applying rewriting rules
EDU 9: selected on the basis of posterior phrase-level confidence .
EDU 10: in the medical domain , we obtain a 0.0 bleu improvement over a reranked baseline
EDU 11: exploiting the same scoring function ,
EDU 12: corresponding to a 0.0 bleu improvement over the original moses baseline .
EDU 13: we show
EDU 14: that if an indication of which phrases require rewriting is provided ,
EDU 15: our automatic rewriting procedure yields an additional improvement of 0.0 bleu .
EDU 16: various analyses ,
EDU 17: including a manual error analysis ,
EDU 18: further illustrate the good performance and potential for improvement of our approach in spite of its simplicity .
EDU 0:
EDU 1: we present methods
EDU 2: to control the lexicon size
EDU 3: when learning a combinatory categorial grammar semantic parser .
EDU 4: existing methods incrementally expand the lexicon
EDU 5: by greedily adding entries ,
EDU 6: considering a single training datapoint at a time .
EDU 7: we propose using corpus-level statistics for lexicon learning decisions .
EDU 8: we introduce voting
EDU 9: to globally consider adding entries to the lexicon ,
EDU 10: and pruning
EDU 11: to remove entries
EDU 12: no longer required to explain the training data .
EDU 13: our methods result in state-of-the-art performance on the task of executing sequences of natural language instructions ,
EDU 14: achieving up to 00 % error reduction ,
EDU 15: with lexicons
EDU 16: that are up to 00 % smaller
EDU 17: and are qualitatively less noisy .
EDU 0:
EDU 1: in this paper , we demonstrate
EDU 2: that significant performance gains can be achieved in ccg semantic parsing
EDU 3: by introducing a linguistically motivated grammar induction scheme .
EDU 4: we present a new morpho-syntactic factored lexicon
EDU 5: that models systematic variations in morphology , syntax , and semantics across word classes .
EDU 6: the grammar uses domain-independent facts about the english language
EDU 7: to restrict the number of incorrect parses
EDU 8: that must be considered ,
EDU 9: thereby enabling effective learning from less data .
EDU 10: experiments in benchmark domains match previous models with one quarter of the data
EDU 11: and provide new state-of-the-art results with all available data ,
EDU 12: including up to 00 % relative test-error reduction .
EDU 0:
EDU 1: we present a model for the automatic semantic analysis of requirements elicitation documents .
EDU 2: our target semantic representation employs live sequence charts , a multi-modal visual language for scenario-based programming ,
EDU 3: which can be directly translated into executable code .
EDU 4: the architecture we propose integrates sentence-level and discourse-level processing in a generative probabilistic framework for the analysis and disambiguation of individual sentences in context .
EDU 5: we show empirically
EDU 6: that the discourse-based model consistently outperforms the sentence-based model
EDU 7: when constructing a system
EDU 8: that reflects all the static ( entities , properties ) and dynamic ( behavioral scenarios ) requirements in the document .
EDU 0:
EDU 1: we propose a novel model
EDU 2: for parsing natural language sentences into their formal semantic representations .
EDU 3: the model is able to perform integrated lexicon acquisition and semantic parsing ,
EDU 4: mapping each atomic element in a complete semantic representation to a contiguous word sequence in the input sentence in a recursive manner ,
EDU 5: where certain overlappings amongst such word sequences are allowed .
EDU 6: it defines distributions over the novel relaxed hybrid tree structures
EDU 7: which jointly represent both sentences and semantics .
EDU 8: such structures allow tractable dynamic programming algorithms to be developed for efficient learning and decoding .
EDU 9: trained under a discriminative setting ,
EDU 10: our model is able to incorporate a rich set of features
EDU 11: where certain unbounded long-distance dependencies can be captured in a principled manner .
EDU 12: we demonstrate through experiments
EDU 13: that by exploiting a large collection of simple features ,
EDU 14: our model is shown to be competitive to previous works
EDU 15: and achieves state-of-the-art performance on standard benchmark data across four different languages .
EDU 16: the system and code can be downloaded from http ://statnlp.org/research/sp/ .
EDU 0:
EDU 1: the anchor words algorithm performs provably efficient topic model inference
EDU 2: by finding an approximate convex hull in a high-dimensional word co-occurrence space .
EDU 3: however , the existing greedy algorithm often selects poor anchor words ,
EDU 4: reducing topic quality and interpretability .
EDU 5: rather than finding an approximate convex hull in a high-dimensional space ,
EDU 6: we propose to find an exact convex hull in a visualizable 0- or 0-dimensional space .
EDU 7: such low-dimensional embeddings both improve topics
EDU 8: and clearly show users why the algorithm selects certain words .
EDU 0:
EDU 1: we generalize contrastive estimation in two ways
EDU 2: that permit adding more knowledge to unsupervised learning .
EDU 3: the first allows the modeler to specify not only the set of corrupted inputs for each observation , but also how bad each one is .
EDU 4: the second allows specifying structural preferences on the latent variable
EDU 5: used to explain the observations .
EDU 6: they require setting additional hyperparameters ,
EDU 7: which can be problematic in unsupervised learning ,
EDU 8: so we investigate new methods for unsupervised model selection and system combination .
EDU 9: we instantiate these ideas for part-of-speech induction
EDU 10: without tag dictionaries ,
EDU 11: improving over contrastive estimation as well as strong benchmarks from the pascal 0000 shared task .
EDU 0:
EDU 1: we introduce a reinforcement learning-based approach to simultaneous machine translation—producing a translation
EDU 2: while receiving input words— between languages with drastically different word orders :
EDU 3: from verb-final languages ( e.g. , german ) to verb-medial languages ( english ) .
EDU 4: in traditional machine translation , a translator must "wait" for source material to appear
EDU 5: before translation begins .
EDU 6: we remove this bottleneck
EDU 7: by predicting the final verb in advance .
EDU 8: we use reinforcement learning
EDU 9: to learn when to trust predictions about unseen , future portions of the sentence .
EDU 10: we also introduce an evaluation metric to measure expeditiousness and quality .
EDU 11: we show
EDU 12: that our new translation model outperforms batch and monotone translation strategies .
EDU 0:
EDU 1: the task of unsupervised induction of probabilistic context-free grammars ( pcfgs ) has attracted a lot of attention in the field of computational linguistics .
EDU 2: although it is a difficult task ,
EDU 3: work in this area is still very much in demand
EDU 4: since it can contribute to the advancement of language parsing and modelling .
EDU 5: in this work , we describe a new algorithm for pcfg induction
EDU 6: based on a principled approach
EDU 7: and capable of inducing accurate yet compact artificial natural language grammars and typical context-free grammars .
EDU 8: moreover , this algorithm can work on large grammars and datasets
EDU 9: and infers correctly even from small samples .
EDU 10: our analysis shows
EDU 11: that the type of grammars
EDU 12: induced by our algorithm
EDU 13: are , in theory , capable of modelling natural language .
EDU 14: one of our experiments shows
EDU 15: that our algorithm can potentially outperform the state-of-the-art in unsupervised parsing on the wsj00 corpus .
EDU 0:
EDU 1: a common approach in text mining tasks such as text categorization , authorship identification or plagiarism detection is to rely on features like words , part-of-speech tags , stems , or some other high-level linguistic features .
EDU 2: in this work , an approach
EDU 3: that uses character n-grams as features
EDU 4: is proposed for the task of native language identification .
EDU 5: instead of doing standard feature selection ,
EDU 6: the proposed approach combines several string kernels
EDU 7: using multiple kernel learning .
EDU 8: kernel ridge regression and kernel discriminant analysis are independently used in the learning stage .
EDU 9: the empirical results
EDU 10: obtained in all the experiments
EDU 11: conducted in this work
EDU 12: indicate
EDU 13: that the proposed approach achieves state of the art performance in native language identification ,
EDU 14: reaching an accuracy
EDU 15: that is 0.0 % above the top scoring system of the 0000 nli shared task .
EDU 16: furthermore , the proposed approach has an important advantage
EDU 17: in that it is language independent and linguistic theory neutral.
EDU 18: in the cross-corpus experiment , the proposed approach shows
EDU 19: that it can also be topic independent ,
EDU 20: improving the state of the art system by 00.0 % .
EDU 0:
EDU 1: predicting vocabulary of second language learners is essential to support their language learning ;
EDU 2: however , because of the large size of language vocabularies ,
EDU 3: we cannot collect information on the entire vocabulary .
EDU 4: for practical measurements , we need to sample a small portion of words from the entire vocabulary
EDU 5: and predict the rest of the words .
EDU 6: in this study , we propose a novel framework for this sampling method .
EDU 7: current methods rely on simple heuristic techniques
EDU 8: involving inflexible manual tuning by educational experts .
EDU 9: we formalize these heuristic techniques as a graph-based non-interactive active learning method
EDU 10: as applied to a special graph .
EDU 11: we show
EDU 12: that by extending the graph ,
EDU 13: we can support additional functionality
EDU 14: such as incorporating domain specificity
EDU 15: and sampling from multiple corpora .
EDU 16: in our experiments , we show
EDU 17: that our extended methods outperform other methods in terms of vocabulary prediction accuracy
EDU 18: when the number of samples is small .
EDU 0:
EDU 1: language transfer , the characteristic second language usage patterns
EDU 2: caused by native language interference ,
EDU 3: is investigated by second language acquisition ( sla ) researchers
EDU 4: seeking to find overused and underused linguistic features .
EDU 5: in this paper we develop and present a methodology
EDU 6: for deriving ranked lists of such features .
EDU 7: using very large learner data ,
EDU 8: we show our method's ability to find relevant candidates
EDU 9: using sophisticated linguistic features .
EDU 10: to illustrate its applicability to sla research ,
EDU 11: we formulate plausible language transfer hypotheses
EDU 12: supported by current evidence .
EDU 13: this is the first work
EDU 14: to extend native language identification to a broader linguistic interpretation of learner data
EDU 15: and address the automatic extraction of underused features on a pernative language basis .
EDU 0:
EDU 1: languages spoken by immigrants change
EDU 2: due to contact with the local languages .
EDU 3: capturing these changes is problematic for current language technologies ,
EDU 4: which are typically developed for speakers of the standard dialect only .
EDU 5: even when dialectal variants are available for such technologies ,
EDU 6: we still need to predict
EDU 7: which dialect is being used .
EDU 8: in this study , we distinguish between the immigrant and the standard dialect of turkish
EDU 9: by focusing on light verb constructions .
EDU 10: we experiment with a number of grammatical and contextual features ,
EDU 11: achieving over 00 % accuracy ( 00 % baseline ) .
EDU 0:
EDU 1: readability is used to provide users with high-quality service in text recommendation or text visualization .
EDU 2: with the increasing use of hand-held devices , reading device is regarded as an important factor for readability .
EDU 3: therefore , this paper investigates the relationship between readability and reading devices such as a smart phone , a tablet , and paper .
EDU 4: we suggest readability factors
EDU 5: that are strongly related with the readability of a specific device
EDU 6: by showing the correlations between various factors in each device and human-rated readability .
EDU 7: our experimental results show
EDU 8: that each device has its own readability characteristics ,
EDU 9: and thus different weights should be imposed on readability factors
EDU 10: according to the device type .
EDU 11: in order to prove the usefulness of the results ,
EDU 12: we apply the device-dependent readability to news article recommendation .
EDU 0:
EDU 1: we propose a new chinese abbreviation prediction method
EDU 2: which can incorporate rich local information
EDU 3: while generating the abbreviation globally .
EDU 4: different to previous character tagging methods ,
EDU 5: we introduce the minimum semantic unit ,
EDU 6: which is more fine-grained than character but more coarse-grained than word ,
EDU 7: to capture word level information in the sequence labeling framework .
EDU 8: to solve the "character duplication" problem in chinese abbreviation prediction ,
EDU 9: we also use a substring tagging strategy to generate local substring tagging candidates .
EDU 10: we use an integer linear programming ( ilp ) formulation with various constraints
EDU 11: to globally decode the final abbreviation from the generated candidates .
EDU 12: experiments show
EDU 13: that our method outperforms the state-of-the-art systems ,
EDU 14: without using any extra resource .
EDU 0:
EDU 1: it has been shown
EDU 2: that news events influence the trends of stock price movements .
EDU 3: however , previous work on news-driven stock market prediction rely on shallow features
EDU 4: ( such as bags-of-words , named entities and noun phrases ) ,
EDU 5: which do not capture structured entity-relation information ,
EDU 6: and hence cannot represent complete and exact events .
EDU 7: recent advances in open information extraction ( open ie ) techniques enable the extraction of structured events from web-scale data .
EDU 8: we propose to adapt open ie technology for event-based stock price movement prediction ,
EDU 9: extracting structured events from large-scale public news
EDU 10: without manual efforts .
EDU 11: both linear and nonlinear models are employed to empirically investigate the hidden and complex relationships between events and the stock market .
EDU 12: largescale experiments show
EDU 13: that the accuracy of s&p 000 index prediction is 00 % ,
EDU 14: and that of individual stock prediction can be over 00 % .
EDU 15: our event-based system outperforms bags-of-words-based baselines , and previously reported systems
EDU 16: trained on s&p 000 stock historical data .
EDU 0:
EDU 1: automatically identifying related specialist terms is a difficult and important task
EDU 2: required to understand the lexical structure of language .
EDU 3: this paper develops a corpus-based method of extracting coherent clusters of satellite terminology— terms on the edge of the lexicon —
EDU 4: using co-occurrence networks of unstructured text .
EDU 5: term clusters are identified
EDU 6: by extracting communities in the co-occurrence graph ,
EDU 7: after which the largest is discarded
EDU 8: and the remaining words are ranked by centrality within a community .
EDU 9: the method is tractable on large corpora ,
EDU 10: requires no document structure and minimal normalization .
EDU 11: the results suggest
EDU 12: that the model is able to extract coherent groups of satellite terms in corpora with varying size , content and structure .
EDU 13: the findings also confirm
EDU 14: that language consists of a densely connected core
EDU 15: ( observed in dictionaries ) and systematic , semantically coherent groups of terms at the edges of the lexicon .
EDU 0:
EDU 1: given the large amounts of online textual documents available these days , e.g. , news articles , weblogs , and scientific papers ,
EDU 2: effective methods for extracting keyphrases ,
EDU 3: which provide a high-level topic description of a document ,
EDU 4: are greatly needed .
EDU 5: in this paper , we propose a supervised model for keyphrase extraction from research papers ,
EDU 6: which are embedded in citation networks .
EDU 7: to this end , we design novel features
EDU 8: based on citation network information
EDU 9: and use them in conjunction with traditional features for keyphrase extraction
EDU 10: to obtain remarkable improvements in performance over strong baselines .
EDU 0:
EDU 1: we propose to use coreference chains
EDU 2: extracted from a large corpus as a resource for semantic tasks .
EDU 3: we extract three million coreference chains and train word embeddings on them .
EDU 4: then , we compare these embeddings to word vectors
EDU 5: derived from raw text data
EDU 6: and show
EDU 7: that coreference-based word embeddings improve f0 on the task of antonym classification by up to .00 .
EDU 0:
EDU 1: this paper proposes to apply the continuous vector representations of words
EDU 2: for discovering keywords from a financial sentiment lexicon .
EDU 3: in order to capture more keywords ,
EDU 4: we also incorporate syntactic information into the continuous bag-of-words ( cbow ) model .
EDU 5: experimental results on a task of financial risk prediction
EDU 6: using the discovered keywords demonstrate
EDU 7: that the proposed approach is good at predicting financial risk .
EDU 0:
EDU 1: when it is not possible to compare the suspicious document to the source document ( s )
EDU 2: plagiarism has been committed from ,
EDU 3: the evidence of plagiarism has to be looked for intrinsically in the document itself .
EDU 4: in this paper , we introduce a novel languageindependent intrinsic plagiarism detection method
EDU 5: which is based on a new text representation
EDU 6: that we called n-gram classes .
EDU 7: the proposed method was evaluated on three publicly available standard corpora .
EDU 8: the obtained results are comparable to the ones
EDU 9: obtained by the best state-of-the-art methods .
EDU 0:
EDU 1: several recent papers on arabic dialect identification have hinted
EDU 2: that using a word unigram model is sufficient and effective for the task .
EDU 3: however , most previous work was done on a standard fairly homogeneous dataset of dialectal user comments .
EDU 4: in this paper , we show
EDU 5: that training on the standard dataset does not generalize ,
EDU 6: because a unigram model may be tuned to topics in the comments
EDU 7: and does not capture the distinguishing features of dialects .
EDU 8: we show
EDU 9: that effective dialect identification requires
EDU 10: that we account for the distinguishing lexical , morphological , and phonological phenomena of dialects .
EDU 11: we show
EDU 12: that accounting for such can improve dialect detection accuracy by nearly 00 % absolute .
EDU 0:
EDU 1: in this paper , we explore the use of keyboard strokes as a means
EDU 2: to access the real-time writing process of online authors , analogously to prosody in speech analysis , in the context of deception detection .
EDU 3: we show
EDU 4: that differences in keystroke patterns like editing maneuvers and duration of pauses can help distinguish between truthful and deceptive writing .
EDU 5: empirical results show
EDU 6: that incorporating keystroke-based features lead to improved performance in deception detection in two different domains :
EDU 7: online reviews and essays .
EDU 0:
EDU 1: statistical language modeling ( lm )
EDU 2: that purports to quantify the acceptability of a given piece of text
EDU 3: has long been an interesting yet challenging research area .
EDU 4: in particular , language modeling for information retrieval ( ir ) has enjoyed remarkable empirical success ;
EDU 5: one emerging stream of the lm approach for ir is to employ the pseudo-relevance feedback process
EDU 6: to enhance the representation of an input query
EDU 7: so as to improve retrieval effectiveness .
EDU 8: this paper presents a continuation of such a general line of research
EDU 9: and the main contribution is threefold .
EDU 10: first , we propose a principled framework
EDU 11: which can unify the relationships among several widely-used query modeling formulations .
EDU 12: second , on top of the successfully developed framework , we propose an extended query modeling formulation
EDU 13: by incorporating critical query-specific information cues
EDU 14: to guide the model estimation .
EDU 15: third , we further adopt and formalize such a framework to the speech recognition and summarization tasks .
EDU 16: a series of empirical experiments reveal the feasibility of such an lm framework and the performance merits of the deduced models on these two tasks .
EDU 0:
EDU 1: we study the topic dynamics of interactions in political debates
EDU 2: using the 0000 republican presidential primary debates as data .
EDU 3: we show
EDU 4: that the tendency of candidates to shift topics changes over the course of the election campaign ,
EDU 5: and that it is correlated with their relative power .
EDU 6: we also show
EDU 7: that our topic shift features help predict candidates' relative rankings .
EDU 0:
EDU 1: we present power low rank ensembles ( plre ) , a flexible framework for n-gram language modeling
EDU 2: where ensembles of low rank matrices and tensors are used
EDU 3: to obtain smoothed probability estimates of words in context .
EDU 4: our method can be understood as a generalization of n-gram modeling to non-integer n ,
EDU 5: and includes standard techniques such as absolute discounting and kneser-ney smoothing as special cases .
EDU 6: plre training is efficient
EDU 7: and our approach outperforms state-of-the-art modified kneser ney baselines in terms of perplexity on large corpora as well as on bleu score in a downstream machine translation task .
EDU 0:
EDU 1: machine reading calls for programs
EDU 2: that read and understand text ,
EDU 3: but most current work only attempts to extract facts from redundant web-scale corpora .
EDU 4: in this paper , we focus on a new reading comprehension task
EDU 5: that requires complex reasoning over a single document .
EDU 6: the input is a paragraph
EDU 7: describing a biological process ,
EDU 8: and the goal is to answer questions
EDU 9: that require an understanding of the relations between entities and events in the process .
EDU 10: to answer the questions ,
EDU 11: we first predict a rich structure
EDU 12: representing the process in the paragraph .
EDU 13: then , we map the question to a formal query ,
EDU 14: which is executed against the predicted structure .
EDU 15: we demonstrate
EDU 16: that answering questions via predicted structures substantially improves accuracy over baselines
EDU 17: that use shallower representations .
EDU 0:
EDU 1: connecting words with senses , namely , sight , hearing , taste , smell and touch ,
EDU 2: to comprehend the sensorial information in language
EDU 3: is a straightforward task for humans
EDU 4: by using commonsense knowledge .
EDU 5: with this in mind , a lexicon
EDU 6: associating words with senses
EDU 7: would be crucial for the computational tasks
EDU 8: aiming at interpretation of language .
EDU 9: however , to the best of our knowledge , there is no systematic attempt in the literature
EDU 10: to build such a resource .
EDU 11: in this paper , we present a sensorial lexicon
EDU 12: that associates english words with senses .
EDU 13: to obtain this resource ,
EDU 14: we apply a computational method
EDU 15: based on bootstrapping and corpus statistics .
EDU 16: the quality of the resulting lexicon is evaluated with a gold standard
EDU 17: created via crowdsourcing .
EDU 18: the results show
EDU 19: that a simple classifier
EDU 20: relying on the lexicon
EDU 21: outperforms two baselines on a sensory classification task , both at word and sentence level ,
EDU 22: and confirm the soundness of the proposed approach for the construction of the lexicon and the usefulness of the resource for computational applications .
EDU 0:
EDU 1: statistical machine translation is quite robust
EDU 2: when it comes to the choice of input representation .
EDU 3: it only requires consistency between training and testing .
EDU 4: as a result , there is a wide range of possible preprocessing choices for data
EDU 5: used in statistical machine translation .
EDU 6: this is even more so for morphologically rich languages
EDU 7: such as arabic .
EDU 8: in this paper , we study the effect of different word-level preprocessing schemes for arabic on the quality of phrase-based statistical machine translation .
EDU 9: we also present and evaluate different methods
EDU 10: for combining preprocessing schemes
EDU 11: resulting in improved translation quality .
EDU 0:
EDU 1: this paper presents an extensive evaluation of five different alignments
EDU 2: and investigates their impact on the corresponding mt system output .
EDU 3: we introduce new measures for intrinsic evaluations
EDU 4: and examine the distribution of phrases and untranslated words
EDU 5: during decoding
EDU 6: to identify which characteristics of different alignments affect translation .
EDU 7: we show
EDU 8: that precision-oriented alignments yield better mt output
EDU 9: ( translating more words
EDU 10: and using longer phrases )
EDU 11: than recalloriented alignments .
EDU 0:
EDU 1: we present a method for unsupervised topic modelling
EDU 2: which adapts methods
EDU 3: used in document classification ( blei et al. , 0000 ; griffiths and steyvers , 0000 )
EDU 4: to unsegmented multi-party discourse transcripts .
EDU 5: we show
EDU 6: how bayesian inference in this generative model can be used
EDU 7: to simultaneously address the problems of topic segmentation and topic identification :
EDU 8: automatically segmenting multi-party meetings into topically coherent segments with performance
EDU 9: which compares well with previous unsupervised segmentation-only methods ( galley et al. , 0000 )
EDU 10: while simultaneously extracting topics
EDU 11: which rate highly
EDU 12: when assessed for coherence by human judges .
EDU 13: we also show
EDU 14: that this method appears robust in the face of off-topic dialogue and speech recognition errors .
EDU 0:
EDU 1: we consider the task of unsupervised lecture segmentation .
EDU 2: we formalize segmentation as a graph-partitioning task
EDU 3: that optimizes the normalized cut criterion .
EDU 4: our approach moves beyond localized comparisons
EDU 5: and takes into account longrange cohesion dependencies .
EDU 6: our results demonstrate
EDU 7: that global analysis improves the segmentation accuracy
EDU 8: and is robust in the presence of speech recognition errors .
EDU 0:
EDU 1: we present an approach to pronoun resolution
EDU 2: based on syntactic paths .
EDU 3: through a simple bootstrapping procedure ,
EDU 4: we learn the likelihood of coreference between a pronoun and a candidate noun
EDU 5: based on the path in the parse tree between the two entities .
EDU 6: this path information enables us to handle previously challenging resolution instances ,
EDU 7: and also robustly addresses traditional syntactic coreference constraints .
EDU 8: highly coreferent paths also allow mining of precise probabilistic gender/number information .
EDU 9: we combine statistical knowledge with well known features in a support vector machine pronoun resolution classifier .
EDU 10: significant gains in performance are observed on several datasets .
EDU 0:
EDU 1: syntactic knowledge is important for pronoun resolution .
EDU 2: traditionally , the syntactic information for pronoun resolution is represented in terms of features
EDU 3: that have to be selected and defined heuristically .
EDU 4: in the paper , we propose a kernel-based method
EDU 5: that can automatically mine the syntactic information from the parse trees for pronoun resolution .
EDU 6: specifically , we utilize the parse trees directly as a structured feature
EDU 7: and apply kernel functions to this feature , as well as other normal features ,
EDU 8: to learn the resolution classifier .
EDU 9: in this way , our approach avoids the efforts
EDU 10: of decoding the parse trees into the set of flat syntactic features .
EDU 11: the experimental results show
EDU 12: that our approach can bring significant performance improvement
EDU 13: and is reliably effective for the pronoun resolution task .
EDU 0:
EDU 1: it has previously been assumed in the psycholinguistic literature
EDU 2: that finite-state models of language are crucially limited in their explanatory power by the locality of the probability distribution and the narrow scope of information
EDU 3: used by the model .
EDU 4: we show
EDU 5: that a simple computational model ( a bigram part-of-speech tagger
EDU 6: based on the design
EDU 7: used by corley and crocker ( 0000 ) )
EDU 8: makes correct predictions on processing difficulty
EDU 9: observed in a wide range of empirical sentence processing data .
EDU 10: we use two modes of evaluation :
EDU 11: one
EDU 12: that relies on comparison with a control sentence ,
EDU 13: paralleling practice in human studies ;
EDU 14: another
EDU 15: that measures probability drop in the disambiguating region of the sentence .
EDU 16: both are surprisingly good indicators of the processing difficulty of garden-path sentences .
EDU 17: the sentences tested are drawn from published sources
EDU 18: and systematically explore five different types of ambiguity :
EDU 19: previous studies have been narrower in scope
EDU 20: and smaller in scale .
EDU 21: we do not deny the limitations of finite-state models ,
EDU 22: but argue
EDU 23: that our results show
EDU 24: that their usefulness has been underestimated .
EDU 0:
EDU 1: we propose in this paper a method
EDU 2: for quantifying sentence grammaticality .
EDU 3: the approach
EDU 4: based on property grammars , a constraint-based syntactic formalism ,
EDU 5: makes it possible to evaluate a grammaticality index for any kind of sentence ,
EDU 6: including ill-formed ones .
EDU 7: we compare on a sample of sentences the grammaticality indices
EDU 8: obtained from pg formalism
EDU 9: and the acceptability judgements
EDU 10: measured by means of a psycholinguistic analysis .
EDU 11: the results show
EDU 12: that the derived grammaticality index is a fairly good tracer of acceptability scores .
EDU 0:
EDU 1: in this paper we present a novel approach
EDU 2: for inducing word alignments from sentence aligned data .
EDU 3: we use a conditional random field ( crf ) , a discriminative model ,
EDU 4: which is estimated on a small supervised training set .
EDU 5: the crf is conditioned on both the source and target texts ,
EDU 6: and thus allows for the use of arbitrary and overlapping features over these data .
EDU 7: moreover , the crf has efficient training and decoding processes
EDU 8: which both find globally optimal solutions .
EDU 9: we apply this alignment model to both french-english and romanian-english language pairs .
EDU 10: we show
EDU 11: how a large number of highly predictive features can be easily incorporated into the crf ,
EDU 12: and demonstrate
EDU 13: that even with only a few hundred word-aligned training sentences , our model improves over the current state-ofthe-art with alignment error rates of 0.00 and 00.0 for the two tasks respectively .
EDU 0:
EDU 1: in this paper we investigate chineseenglish name transliteration
EDU 2: using comparable corpora , corpora
EDU 3: where texts in the two languages deal in some of the same topics —
EDU 4: and therefore share references to named entities —
EDU 5: but are not translations of each other .
EDU 6: we present two distinct methods for transliteration ,
EDU 7: one approach
EDU 8: using phonetic transliteration ,
EDU 9: and the second
EDU 10: using the temporal distribution of candidate pairs .
EDU 11: each of these approaches works quite well ,
EDU 12: but by combining the approaches
EDU 13: one can achieve even better results .
EDU 14: we then propose a novel score propagation method
EDU 15: that utilizes the co-occurrence of transliteration pairs within document pairs .
EDU 16: this propagation method achieves further improvement over the best results from the previous step .
EDU 0:
EDU 1: we present a novel method
EDU 2: for extracting parallel sub-sentential fragments from comparable , non-parallel bilingual corpora .
EDU 3: by analyzing potentially similar sentence pairs
EDU 4: using a signal processinginspired approach ,
EDU 5: we detect
EDU 6: which segments of the source sentence are translated into segments in the target sentence ,
EDU 7: and which are not .
EDU 8: this method enables us to extract useful machine translation training data even from very non-parallel corpora ,
EDU 9: which contain no parallel sentence pairs .
EDU 10: we evaluate the quality of the extracted data
EDU 11: by showing
EDU 12: that it improves the performance of a state-of-the-art statistical machine translation system .
EDU 0:
EDU 1: instances of a word
EDU 2: drawn from different domains
EDU 3: may have different sense priors ( the proportions of the different senses of a word ) .
EDU 4: this in turn affects the accuracy of word sense disambiguation ( wsd ) systems
EDU 5: trained and applied on different domains .
EDU 6: this paper presents a method
EDU 7: to estimate the sense priors of words
EDU 8: drawn from a new domain ,
EDU 9: and highlights the importance
EDU 10: of using well calibrated probabilities
EDU 11: when performing these estimations .
EDU 12: by using well calibrated probabilities ,
EDU 13: we are able to estimate the sense priors effectively
EDU 14: to achieve significant improvements in wsd accuracy .
EDU 0:
EDU 1: combination methods are an effective way
EDU 2: of improving system performance .
EDU 3: this paper examines the benefits of system combination for unsupervised wsd .
EDU 4: we investigate several voting- and arbiterbased combination strategies over a diverse pool of unsupervised wsd systems .
EDU 5: our combination methods rely on predominant senses
EDU 6: which are derived automatically from raw text .
EDU 7: experiments
EDU 8: using the semcor and senseval-0 data sets
EDU 9: demonstrate
EDU 10: that our ensembles yield signifi-cantly better results
EDU 11: when compared with state-of-the-art .
EDU 0:
EDU 1: fine-grained sense distinctions are one of the major obstacles to successful word sense disambiguation .
EDU 2: in this paper , we present a method
EDU 3: for reducing the granularity of the wordnet sense inventory
EDU 4: based on the mapping to a manually crafted dictionary
EDU 5: encoding sense hierarchies ,
EDU 6: namely the oxford dictionary of english .
EDU 7: we assess the quality of the mapping and the induced clustering ,
EDU 8: and evaluate the performance of coarse wsd systems in the senseval-0 english all-words task .
EDU 0:
EDU 1: in this paper , we present espresso , a weakly-supervised , general-purpose , and accurate algorithm
EDU 2: for harvesting semantic relations .
EDU 3: the main contributions are :
EDU 4: i ) a method for exploiting generic patterns
EDU 5: by filtering incorrect instances
EDU 6: using the web ;
EDU 7: and ii ) a principled measure of pattern and instance reliability
EDU 8: enabling the filtering algorithm .
EDU 9: we present an empirical comparison of espresso with various state of the art systems , on different size and genre corpora ,
EDU 10: on extracting various general and specific relations .
EDU 11: experimental results show
EDU 12: that our exploitation of generic patterns substantially increases system recall with small effect on overall precision .
EDU 0:
EDU 1: this paper proposes a novel hierarchical learning strategy
EDU 2: to deal with the data sparseness problem in relation extraction
EDU 3: by modeling the commonality among related classes .
EDU 4: for each class in the hierarchy
EDU 5: either manually predefined
EDU 6: or automatically clustered ,
EDU 7: a linear discriminative function is determined in a topdown way
EDU 8: using a perceptron algorithm with the lower-level weight vector
EDU 9: derived from the upper-level weight vector .
EDU 10: as the upper-level class normally has much more positive training examples than the lower-level class ,
EDU 11: the corresponding linear discriminative function can be determined more reliably .
EDU 12: the upperlevel discriminative function then can effectively guide the discriminative function learning in the lower-level ,
EDU 13: which otherwise might suffer from limited training data .
EDU 14: evaluation on the ace rdc 0000 corpus shows
EDU 15: that the hierarchical strategy much improves the performance by 0.0 and 0.0 in f-measure on least- and medium- frequent relations respectively .
EDU 16: it also shows
EDU 17: that our system outperforms the previous best-reported system by 0.0 in f-measure on the 00 subtypes
EDU 18: using the same feature set .
EDU 0:
EDU 1: shortage of manually labeled data is an obstacle to supervised relation extraction methods .
EDU 2: in this paper we investigate a graph based semi-supervised learning algorithm , a label propagation ( lp ) algorithm , for relation extraction .
EDU 3: it represents labeled and unlabeled examples and their distances as the nodes and the weights of edges of a graph ,
EDU 4: and tries to obtain a labeling function
EDU 5: to satisfy two constraints :
EDU 6: 0 ) it should be fixed on the labeled nodes ,
EDU 7: 0 ) it should be smooth on the whole graph .
EDU 8: experiment results on the ace corpus showed
EDU 9: that this lp algorithm achieves better performance than svm
EDU 10: when only very few labeled examples are available ,
EDU 11: and it also performs better than bootstrapping for the relation extraction task .
EDU 0:
EDU 1: this paper proposes a generic mathematical formalism for the combination of various structures :
EDU 2: strings , trees , dags , graphs and products of them .
EDU 3: the polarization of the objects of the elementary structures controls the saturation of the final structure .
EDU 4: this formalism is both elementary and powerful enough
EDU 5: to strongly simulate many grammar formalisms ,
EDU 6: such as rewriting systems , dependency grammars , tag , hpsg and lfg .
EDU 0:
EDU 1: this work provides the essential foundations for modular construction of ( typed ) unification grammars for natural languages .
EDU 2: much of the information in such grammars is encoded in the signature ,
EDU 3: and hence the key is facilitating a modularized development of type signatures .
EDU 4: we introduce a definition of signature modules
EDU 5: and show
EDU 6: how two modules combine .
EDU 7: our definitions are motivated by the actual needs of grammar developers
EDU 8: obtained through a careful examination of large scale grammars .
EDU 9: we show
EDU 10: that our definitions meet these needs
EDU 11: by conforming to a detailed set of desiderata .
EDU 0:
EDU 1: this paper investigates the use of sublexical units as a solution
EDU 2: to handling the complex morphology with productive derivational processes ,
EDU 3: in the development of a lexical functional grammar for turkish .
EDU 4: such sublexical units make it possible to expose the internal structure of words with multiple derivations to the grammar rules in a uniform manner .
EDU 5: this in turn leads to more succinct and manageable rules .
EDU 6: further , the semantics of the derivations can also be systematically reflected in a compositional way
EDU 7: by constructing pred values on the fly .
EDU 8: we illustrate
EDU 9: how we use sublexical units
EDU 10: for handling simple productive derivational morphology and more interesting cases
EDU 11: such as causativization , etc. ,
EDU 12: which change verb valency .
EDU 13: our priority is to handle several linguistic phenomena
EDU 14: in order to observe the effects of our approach on both the c-structure and the f-structure representation , and grammar writing ,
EDU 15: leaving the coverage and evaluation issues aside for the moment .
EDU 0:
EDU 1: a grammatical method
EDU 2: of combining two kinds of speech repair cues
EDU 3: is presented .
EDU 4: one cue , prosodic disjuncture , is detected by a decision tree-based ensemble classifier
EDU 5: that uses acoustic cues
EDU 6: to identify
EDU 7: where normal prosody seems to be interrupted ( lickley , 0000 ) .
EDU 8: the other cue , syntactic parallelism , codifies the expectation
EDU 9: that repairs continue a syntactic category
EDU 10: that was left unfinished in the reparandum ( levelt , 0000 ) .
EDU 11: the two cues are combined in a treebank pcfg
EDU 12: whose states are split
EDU 13: using a few simple tree transformations .
EDU 14: parsing performance on the switchboard and fisher corpora suggests
EDU 15: that these two cues help to locate speech repairs in a synergistic way .
EDU 0:
EDU 1: spoken monologues feature greater sentence length and structural complexity
EDU 2: than do spoken dialogues .
EDU 3: to achieve high parsing performance for spoken monologues ,
EDU 4: it could prove effective to simplify the structure
EDU 5: by dividing a sentence into suitable language units .
EDU 6: this paper proposes a method for dependency parsing of japanese monologues
EDU 7: based on sentence segmentation .
EDU 8: in this method , the dependency parsing is executed in two stages : at the clause level and the sentence level .
EDU 9: first , the dependencies within a clause are identified
EDU 10: by dividing a sentence into clauses
EDU 11: and executing stochastic dependency parsing for each clause .
EDU 12: next , the dependencies over clause boundaries are identified stochastically ,
EDU 13: and the dependency structure of the entire sentence is thus completed .
EDU 14: an experiment
EDU 15: using a spoken monologue corpus
EDU 16: shows this method to be effective for efficient dependency parsing of japanese monologue sentences .
EDU 0:
EDU 1: this paper describes a parser
EDU 2: which generates parse trees with empty elements
EDU 3: in which traces and fillers are co-indexed .
EDU 4: the parser is an unlexicalized pcfg parser
EDU 5: which is guaranteed to return the most probable parse .
EDU 6: the grammar is extracted from a version of the penn treebank
EDU 7: which was automatically annotated with features in the style of klein and manning ( 0000 ) .
EDU 8: the annotation includes gpsg-style slash features
EDU 9: which link traces and fillers , and other features
EDU 10: which improve the general parsing accuracy .
EDU 11: in an evaluation on the penn treebank ( marcus et al. , 0000 ) , the parser outperformed other unlexicalized pcfg parsers in terms of labeled bracketing fscore .
EDU 12: its results for the empty category prediction task and the trace-filler coindexation task exceed all previously reported results with 00.0 % and 00.0 % fscore , respectively
EDU 0:
EDU 1: we explore the use of restricted dialogue contexts in reinforcement learning ( rl ) of effective dialogue strategies for information seeking spoken dialogue systems ( e.g. communicator ( walker et al. , 0000 ) ) .
EDU 2: the contexts
EDU 3: we use
EDU 4: are richer than previous research in this area , e.g. ( levin and pieraccini , 0000 ; scheffler and young , 0000 ; singh et al. , 0000 ; pietquin , 0000 ) ,
EDU 5: which use only slot-based information ,
EDU 6: but are much less complex than the full dialogue "information states"
EDU 7: explored in ( henderson et al. , 0000 ) ,
EDU 8: for which tractabe learning is an issue .
EDU 9: we explore
EDU 10: how incrementally adding richer features allows learning of more effective dialogue strategies .
EDU 11: we use 0 user simulations
EDU 12: learned from communicator data ( walker et al. , 0000 ; georgila et al. , 0000b )
EDU 13: to explore the effects of different features on learned dialogue strategies .
EDU 14: our results show
EDU 15: that adding the dialogue moves of the last system and user turns increases the average reward of the automatically learned strategies by 00.0 % over the original ( hand-coded ) communicator systems , and by 0.0 % over a baseline rl policy
EDU 16: that uses only slot-status features .
EDU 17: we show
EDU 18: that the learned strategies exhibit an emergent "focus switching" strategy and effective use of the "give help" action .
EDU 0:
EDU 1: speech recognition problems are a reality in current spoken dialogue systems .
EDU 2: in order to better understand these phenomena ,
EDU 3: we study dependencies between speech recognition problems and several higher level dialogue factors
EDU 4: that define our notion of student state :
EDU 5: frustration/anger , certainty and correctness .
EDU 6: we apply chi square ( χ0 ) analysis to a corpus of speech-based computer tutoring dialogues
EDU 7: to discover these dependencies both within and across turns .
EDU 8: significant dependencies are combined
EDU 9: to produce interesting insights regarding speech recognition problems
EDU 10: and to propose new strategies
EDU 11: for handling these problems .
EDU 12: we also find
EDU 13: that tutoring , as a new domain for speech applications , exhibits interesting tradeoffs and new factors
EDU 14: to consider for spoken dialogue design .
EDU 0:
EDU 1: data-driven techniques have been used for many computational linguistics tasks .
EDU 2: models
EDU 3: derived from data
EDU 4: are generally more robust than hand-crafted systems
EDU 5: since they better reflect the distribution of the phenomena
EDU 6: being modeled .
EDU 7: with the availability of large corpora of spoken dialog , dialog management is now reaping the benefits of data-driven techniques .
EDU 8: in this paper , we compare two approaches
EDU 9: to modeling subtask structure in dialog :
EDU 10: a chunk-based model of subdialog sequences , and a parse-based , or hierarchical , model .
EDU 11: we evaluate these models
EDU 12: using customer agent dialogs from a catalog service domain .
EDU 0:
EDU 1: we present a new semi-supervised training procedure for conditional random fields ( crfs )
EDU 2: that can be used
EDU 3: to train sequence segmentors and labelers from a combination of labeled and unlabeled training data .
EDU 4: our approach is based on extending the minimum entropy regularization framework to the structured prediction case ,
EDU 5: yielding a training objective
EDU 6: that combines unlabeled conditional entropy with labeled conditional likelihood .
EDU 7: although the training objective is no longer concave ,
EDU 8: it can still be used to improve an initial model
EDU 9: ( e.g. obtained from supervised training ) by iterative ascent .
EDU 10: we apply our new training algorithm to the problem
EDU 11: of identifying gene and protein mentions in biological texts ,
EDU 12: and show
EDU 13: that incorporating unlabeled data improves the performance of the supervised crf in this case .
EDU 0:
EDU 1: this paper proposes a framework
EDU 2: for training conditional random fields ( crfs )
EDU 3: to optimize multivariate evaluation measures ,
EDU 4: including non-linear measures
EDU 5: such as f-score .
EDU 6: our proposed framework is derived from an error minimization approach
EDU 7: that provides a simple solution
EDU 8: for directly optimizing any evaluation measure .
EDU 9: specifically focusing on sequential segmentation tasks , i.e. text chunking and named entity recognition ,
EDU 10: we introduce a loss function
EDU 11: that closely reflects the target evaluation measure for these tasks , namely , segmentation f-score .
EDU 12: our experiments show
EDU 13: that our method performs better than standard crf training .
EDU 0:
EDU 1: lasso is a regularization method for parameter estimation in linear models .
EDU 2: it optimizes the model parameters with respect to a loss function subject to model complexities .
EDU 3: this paper explores the use of lasso for statistical language modeling for text input .
EDU 4: owing to the very large number of parameters ,
EDU 5: directly optimizing the penalized lasso loss function is impossible .
EDU 6: therefore , we investigate two approximation methods ,
EDU 7: the boosted lasso ( blasso ) and the forward stagewise linear regression ( fslr ) .
EDU 8: both methods ,
EDU 9: when used with the exponential loss function ,
EDU 10: bear strong resemblance to the boosting algorithm
EDU 11: which has been used as a discriminative training method for language modeling .
EDU 12: evaluations on the task of japanese text input show
EDU 13: that blasso is able to produce the best approximation to the lasso solution ,
EDU 14: and leads to a significant improvement , in terms of character error rate , over boosting and the traditional maximum likelihood estimation .
EDU 0:
EDU 1: we have developed an automated japanese essay scoring system
EDU 2: called jess .
EDU 3: the system needs expert writings rather than expert raters
EDU 4: to build the evaluation model .
EDU 5: by detecting statistical outliers of predetermined aimed essay features
EDU 6: compared with many professional writings for each prompt ,
EDU 7: our system can evaluate essays .
EDU 8: the following three features are examined :
EDU 9: ( 0 ) rhetoric — syntactic variety , or the use of various structures in the arrangement of phases , clauses , and sentences ,
EDU 10: ( 0 ) organization — characteristics
EDU 11: associated with the orderly presentation of ideas ,
EDU 12: such as rhetorical features and linguistic cues ,
EDU 13: and ( 0 ) content — vocabulary
EDU 14: related to the topic ,
EDU 15: such as relevant information and precise or specialized vocabulary .
EDU 16: the final evaluation score is calculated
EDU 17: by deducting from a perfect score
EDU 18: assigned by a learning process
EDU 19: using editorials and columns from the mainichi daily news newspaper .
EDU 20: a diagnosis for the essay is also given .
EDU 0:
EDU 1: this paper proposes a method
EDU 2: for detecting errors in article usage and singular plural usage
EDU 3: based on the mass count distinction .
EDU 4: first , it learns decision lists from training data
EDU 5: generated automatically to distinguish mass and count nouns .
EDU 6: then , in order to improve its performance ,
EDU 7: it is augmented by feedback
EDU 8: that is obtained from the writing of learners .
EDU 9: finally , it detects errors
EDU 10: by applying rules to the mass count distinction .
EDU 11: experiments show
EDU 12: that it achieves a recall of 0.00 and a precision of 0.00
EDU 13: and outperforms other methods
EDU 14: used for comparison
EDU 15: when augmented by feedback .
EDU 0:
EDU 1: this paper presents a pilot study of the use of phrasal statistical machine translation ( smt ) techniques
EDU 2: to identify and correct writing errors
EDU 3: made by learners of english as a second language ( esl ) .
EDU 4: using examples of mass noun errors
EDU 5: found in the chinese learner error corpus ( clec )
EDU 6: to guide creation of an engineered training set ,
EDU 7: we show
EDU 8: that application of the smt paradigm can capture errors
EDU 9: not well addressed by widely-used proofing tools
EDU 10: designed for native speakers .
EDU 11: our system was able to correct 00.00 % of mistakes in a set of naturally occurring examples of mass noun errors
EDU 12: found on the world wide web ,
EDU 13: suggesting
EDU 14: that efforts
EDU 15: to collect alignable corpora of pre- and post-editing esl writing samples offer
EDU 16: can enable the development of smt-based writing assistance tools
EDU 17: capable of repairing many of the complex syntactic and lexical problems
EDU 18: found in the writing of esl learners .
EDU 0:
EDU 1: transforming syntactic representations
EDU 2: in order to improve parsing accuracy
EDU 3: has been exploited successfully in statistical parsing systems
EDU 4: using constituency-based representations .
EDU 5: in this paper , we show
EDU 6: that similar transformations can give substantial improvements also in data-driven dependency parsing .
EDU 7: experiments on the prague dependency treebank show
EDU 8: that systematic transformations of coordinate structures and verb groups result in a 00 % error reduction for a deterministic data-driven dependency parser .
EDU 9: combining these transformations with previously proposed techniques
EDU 10: for recovering nonprojective dependencies
EDU 11: leads to state-of-the-art accuracy for the given data set .
EDU 0:
EDU 1: spoken language generation for dialogue systems requires a dictionary
EDU 2: of mappings between semantic representations of concepts
EDU 3: the system wants to express
EDU 4: and realizations of those concepts .
EDU 5: dictionary creation is a costly process ;
EDU 6: it is currently done by hand for each dialogue domain .
EDU 7: we propose a novel unsupervised method
EDU 8: for learning such mappings from user reviews in the target domain ,
EDU 9: and test it on restaurant reviews .
EDU 10: we test the hypothesis
EDU 11: that user reviews
EDU 12: that provide individual ratings for distinguished attributes of the domain entity
EDU 13: make it possible to map review sentences to their semantic representation with high precision .
EDU 14: experimental analyses show
EDU 15: that the mappings learned cover most of the domain ontology ,
EDU 16: and provide good linguistic variation .
EDU 17: a subjective user evaluation shows
EDU 18: that the consistency between the semantic representations and the learned realizations is high
EDU 19: and that the naturalness of the realizations is higher than a hand-crafted baseline .
EDU 0:
EDU 1: this paper presents a method
EDU 2: for building genetic language taxonomies
EDU 3: based on a new approach to comparing lexical forms .
EDU 4: instead of comparing forms cross-linguistically ,
EDU 5: a matrix of languageinternal similarities between forms is calculated .
EDU 6: these matrices are then compared
EDU 7: to give distances between languages .
EDU 8: we argue
EDU 9: that this coheres better with current thinking in linguistics and psycholinguistics .
EDU 10: an implementation of this approach ,
EDU 11: called philologicon ,
EDU 12: is described , along with its application to dyen et al.'s ( 0000 ) ninety-five wordlists from indo-european languages .
EDU 0:
EDU 1: a good dictionary contains not only many entries and a lot of information
EDU 2: concerning each one of them ,
EDU 3: but also adequate means
EDU 4: to reveal the stored information .
EDU 5: information access depends crucially on the quality of the index .
EDU 6: we will present here some ideas
EDU 7: of how a dictionary could be enhanced
EDU 8: to support a speaker/writer to find the word s/he is looking for .
EDU 9: to this end we suggest to add to an existing electronic resource an index
EDU 10: based on the notion of association .
EDU 11: we will also present preliminary work
EDU 12: of how a subset of such associations , for example , topical associations , can be acquired by filtering a network of lexical co-occurrences
EDU 13: extracted from a corpus .
EDU 0:
EDU 1: we investigate the utility of supertag information
EDU 2: for guiding an existing dependency parser of german .
EDU 3: using weighted constraints to integrate the additionally available information ,
EDU 4: the decision process of the parser is influenced
EDU 5: by changing its preferences ,
EDU 6: without excluding alternative structural interpretations from being considered .
EDU 7: the paper reports on a series of experiments
EDU 8: using varying models of supertags
EDU 9: that significantly increase the parsing accuracy .
EDU 10: in addition , an upper bound on the accuracy
EDU 11: that can be achieved with perfect supertags
EDU 12: is estimated .
EDU 0:
EDU 1: we present a novel approach
EDU 2: for discovering word categories , sets of words
EDU 3: sharing a significant aspect of their meaning .
EDU 4: we utilize meta-patterns of high frequency words and content words
EDU 5: in order to discover pattern candidates .
EDU 6: symmetric patterns are then identified
EDU 7: using graph-based measures ,
EDU 8: and word categories are created
EDU 9: based on graph clique sets .
EDU 10: our method is the first pattern-based method
EDU 11: that requires no corpus annotation or manually provided seed patterns or words .
EDU 12: we evaluate our algorithm on very large corpora in two languages ,
EDU 13: using both human judgments and wordnet based evaluation .
EDU 14: our fully unsupervised results are superior to previous work
EDU 15: that used a pos tagged corpus , and computation time for huge corpora are orders of magnitude faster
EDU 16: than previously reported .
EDU 0:
EDU 1: we present bayesum ( for "bayesian summarization" ) , a model for sentence extraction in query-focused summarization .
EDU 2: bayesum leverages the common case
EDU 3: in which multiple documents are relevant to a single query .
EDU 4: using these documents as reinforcement for query terms ,
EDU 5: bayesum is not afflicted by the paucity of information in short queries .
EDU 6: we show
EDU 7: that approximate inference in bayesum is possible on large data sets
EDU 8: and results in a state-of-the-art summarization system .
EDU 9: furthermore , we show
EDU 10: how bayesum can be understood as a justified query expansion technique in the language modeling for ir framework .
EDU 0:
EDU 1: we present an unsupervised learning algorithm
EDU 2: that mines large text corpora for patterns
EDU 3: that express implicit semantic relations .
EDU 4: for a given input word pair x : y with some unspecified semantic relations , the corresponding output list of patterns is ranked
EDU 5: according to how well each pattern pi expresses the relations between x and y .
EDU 6: for example , given x = ostrich and y = bird ,
EDU 7: the two highest ranking output patterns are "x is the largest y" and "y such as the x" .
EDU 8: the output patterns are intended to be useful
EDU 9: for finding further pairs with the same relations ,
EDU 10: to support the construction of lexicons , ontologies , and semantic networks .
EDU 11: the patterns are sorted by pertinence ,
EDU 12: where the pertinence of a pattern pi for a word pair x : y is the expected relational similarity between the given pair and typical pairs for pi .
EDU 13: the algorithm is empirically evaluated on two tasks ,
EDU 14: solving multiple-choice sat word analogy questions and classifying semantic relations in noun-modifier pairs .
EDU 15: on both tasks , the algorithm achieves state-of-the-art results ,
EDU 16: performing significantly better than several alternative pattern ranking algorithms ,
EDU 17: based on tf-idf .
EDU 0:
EDU 1: in this paper we investigate the benefit of stochastic predictor components for the parsing quality
EDU 2: which can be obtained with a rule-based dependency grammar .
EDU 3: by including a chunker , a supertagger , a pp attacher , and a fast probabilistic parser
EDU 4: we were able to improve upon the baseline by 0.0 % ,
EDU 5: bringing the overall labelled accuracy to 00.0 % on the german negra corpus .
EDU 6: we attribute the successful integration to the ability of the underlying grammar model
EDU 7: to combine uncertain evidence in a soft manner ,
EDU 8: thus avoiding the problem of error propagation .
EDU 0:
EDU 1: we introduce an error mining technique
EDU 2: for automatically detecting errors in resources
EDU 3: that are used in parsing systems .
EDU 4: we applied this technique on parsing results
EDU 5: produced on several million words by two distinct parsing systems ,
EDU 6: which share the syntactic lexicon and the pre-parsing processing chain .
EDU 7: we were thus able to identify missing and erroneous information in these resources .
EDU 0:
EDU 1: statistical parsers
EDU 2: trained and tested on the penn wall street journal ( wsj ) treebank
EDU 3: have shown vast improvements over the last 00 years .
EDU 4: much of this improvement , however , is based upon an ever-increasing number of features
EDU 5: to be trained on ( typically ) the wsj treebank data .
EDU 6: this has led to concern
EDU 7: that such parsers may be too finely tuned to this corpus at the expense of portability to other genres .
EDU 8: such worries have merit .
EDU 9: the standard "charniak parser" checks in at a labeled precisionrecall f-measure of 00.0 % on the penn wsj test set ,
EDU 10: but only 00.0 % on the test set from the brown treebank corpus .
EDU 11: this paper should allay these fears .
EDU 12: in particular , we show
EDU 13: that the reranking parser
EDU 14: described in charniak and johnson ( 0000 )
EDU 15: improves performance of the parser on brown to 00.0 % .
EDU 16: furthermore , use of the self-training techniques described in ( mcclosky et al. , 0000 )
EDU 17: raise this to 00.0 % ( an error reduction of 00 % ) again
EDU 18: without any use of labeled brown data .
EDU 19: this is remarkable
EDU 20: since training the parser and reranker on labeled brown data achieves only 00.0 % .
EDU 0:
EDU 1: lexical classes ,
EDU 2: when tailored to the application and domain in question ,
EDU 3: can provide an effective means
EDU 4: to deal with a number of natural language processing ( nlp ) tasks .
EDU 5: while manual construction of such classes is difficult ,
EDU 6: recent research shows
EDU 7: that it is possible to automatically induce verb classes from cross-domain corpora with promising accuracy .
EDU 8: we report a novel experiment
EDU 9: where similar technology is applied to the important , challenging domain of biomedicine .
EDU 10: we show
EDU 11: that the resulting classification ,
EDU 12: acquired from a corpus of biomedical journal articles ,
EDU 13: is highly accurate and strongly domainspecific .
EDU 14: it can be used
EDU 15: to aid bio-nlp directly or as useful material
EDU 16: for investigating the syntax and semantics of verbs in biomedical texts .
EDU 0:
EDU 1: various methods have been proposed for automatic synonym acquisition ,
EDU 2: as synonyms are one of the most fundamental lexical knowledge .
EDU 3: whereas many methods are based on contextual clues of words ,
EDU 4: little attention has been paid to what kind of categories of contextual information are useful for the purpose .
EDU 5: this study has experimentally investigated the impact of contextual information selection ,
EDU 6: by extracting three kinds of word relationships from corpora :
EDU 7: dependency , sentence co-occurrence , and proximity .
EDU 8: the evaluation result shows
EDU 9: that while dependency and proximity perform relatively well by themselves ,
EDU 10: combination of two or more kinds of contextual information gives more stable performance .
EDU 11: we've further investigated useful selection of dependency relations and modification categories ,
EDU 12: and it is found
EDU 13: that modification has the greatest contribution , even greater than the widely adopted subjectobject combination .
EDU 0:
EDU 1: accurately representing synonymy
EDU 2: using distributional similarity
EDU 3: requires large volumes of data
EDU 4: to reliably represent infrequent words .
EDU 5: however , the naive nearest neighbour approach
EDU 6: to comparing context vectors
EDU 7: extracted from large corpora
EDU 8: scales poorly ( o ( n0 ) in the vocabulary size ) .
EDU 9: in this paper , we compare several existing approaches
EDU 10: to approximating the nearestneighbour search for distributional similarity .
EDU 11: we investigate the trade-off between efficiency and accuracy ,
EDU 12: and find
EDU 13: that sash ( houle and sakuma , 0000 ) provides the best balance .
EDU 0:
EDU 1: event-based summarization attempts to select and organize the sentences in a summary with respect to the events or the sub-events
EDU 2: that the sentences describe .
EDU 3: each event has its own internal structure ,
EDU 4: and meanwhile often relates to other events semantically , temporally , spatially , causally or conditionally .
EDU 5: in this paper , we define an event as one or more event terms along with the named entities associated ,
EDU 6: and present a novel approach
EDU 7: to derive intra- and inter- event relevance
EDU 8: using the information of internal association , semantic relatedness , distributional similarity and named entity clustering .
EDU 9: we then apply pagerank ranking algorithm
EDU 10: to estimate the significance of an event for inclusion in a summary from the event relevance derived .
EDU 11: experiments on the duc 0000 test data shows
EDU 12: that the relevance of the named entities
EDU 13: involved in events
EDU 14: achieves better result
EDU 15: when their relevance is derived from the event terms
EDU 16: they associate .
EDU 17: it also reveals
EDU 18: that the topic-specific relevance from documents themselves outperforms the semantic relevance from a general purpose knowledge base like word-net .
EDU 0:
EDU 1: sentence compression is the task
EDU 2: of producing a summary at the sentence level .
EDU 3: this paper focuses on three aspects of this task
EDU 4: which have not received detailed treatment in the literature :
EDU 5: training requirements , scalability , and automatic evaluation .
EDU 6: we provide a novel comparison between a supervised constituentbased and an weakly supervised wordbased compression algorithm
EDU 7: and examine
EDU 8: how these models port to different domains ( written vs. spoken text ) .
EDU 9: to achieve this ,
EDU 10: a human-authored compression corpus has been created
EDU 11: and our study highlights potential problems with the automatically gathered compression corpora
EDU 12: currently used .
EDU 13: finally , we assess
EDU 14: whether automatic evaluation measures can be used
EDU 15: to determine compression quality .
EDU 0:
EDU 1: ordering information is a difficult but important task for applications
EDU 2: generating natural-language text .
EDU 3: we present a bottom-up approach
EDU 4: to arranging sentences
EDU 5: extracted for multi-document summarization .
EDU 6: to capture the association and order of two textual segments ( eg , sentences ) ,
EDU 7: we define four criteria , chronology , topical-closeness , precedence , and succession .
EDU 8: these criteria are integrated into a criterion by a supervised learning approach .
EDU 9: we repeatedly concatenate two textual segments into one segment
EDU 10: based on the criterion
EDU 11: until we obtain the overall segment with all sentences arranged .
EDU 12: our experimental results show a significant improvement over existing sentence ordering strategies .
EDU 0:
EDU 1: we have constructed a corpus of news articles
EDU 2: in which events are annotated for estimated bounds on their duration .
EDU 3: here we describe a method
EDU 4: for measuring inter-annotator agreement for these event duration distributions .
EDU 5: we then show
EDU 6: that machine learning techniques
EDU 7: applied to this data
EDU 8: yield coarse-grained event duration information ,
EDU 9: considerably outperforming a baseline
EDU 10: and approaching human performance .
EDU 0:
EDU 1: in this paper we define a novel similarity measure between examples of textual entailments
EDU 2: and we use it as a kernel function in support vector machines ( svms ) .
EDU 3: this allows us to automatically learn the rewrite rules
EDU 4: that describe a non trivial set of entailment cases .
EDU 5: the experiments with the data sets of the rte 0000 challenge show an improvement of 0.0 % over the state-of-the-art methods .
EDU 0:
EDU 1: we present an efficient algorithm for the redundancy elimination problem :
EDU 2: given an underspecified semantic representation ( usr ) of a scope ambiguity ,
EDU 3: compute an usr with fewer mutually equivalent readings .
EDU 4: the algorithm operates on underspecified chart representations
EDU 5: which are derived from dominance graphs ;
EDU 6: it can be applied to the usrs
EDU 7: computed by large-scale grammars .
EDU 8: we evaluate the algorithm on a corpus ,
EDU 9: and show
EDU 10: that it reduces the degree of ambiguity significantly
EDU 11: while taking negligible runtime .
EDU 0:
EDU 1: the psycholinguistic literature provides evidence for syntactic priming , i.e. , the tendency
EDU 2: to repeat structures .
EDU 3: this paper describes a method
EDU 4: for incorporating priming into an incremental probabilistic parser .
EDU 5: three models are compared ,
EDU 6: which involve priming of rules between sentences , within sentences , and within coordinate structures .
EDU 7: these models simulate the reading time advantage for parallel structures
EDU 8: found in human data ,
EDU 9: and also yield a small increase in overall parsing accuracy .
EDU 0:
EDU 1: we present a novel classifier-based deterministic parser for chinese constituency parsing .
EDU 2: our parser computes parse trees from bottom up in one pass ,
EDU 3: and uses classifiers
EDU 4: to make shift-reduce decisions .
EDU 5: trained and evaluated on the standard training and test sets ,
EDU 6: our best model
EDU 7: ( using stacked classifiers )
EDU 8: runs in linear time
EDU 9: and has labeled precision and recall above 00 %
EDU 10: using gold-standard part-of-speech tags ,
EDU 11: surpassing the best published results .
EDU 12: our svm parser is 0-00 times faster than state-of-the-art parsers ,
EDU 13: while producing more accurate results .
EDU 14: our maxent and dtree parsers run at speeds 00-000 times faster than state-of-the-art parsers , but with 0-0 % losses in accuracy .
EDU 0:
EDU 1: we present an automatic approach to tree annotation
EDU 2: in which basic nonterminal symbols are alternately split and merged
EDU 3: to maximize the likelihood of a training treebank .
EDU 4: starting with a simple xbar grammar ,
EDU 5: we learn a new grammar
EDU 6: whose nonterminals are subsymbols of the original nonterminals .
EDU 7: in contrast with previous work ,
EDU 8: we are able to split various terminals to different degrees ,
EDU 9: as appropriate to the actual complexity in the data .
EDU 10: our grammars automatically learn the kinds of linguistic distinctions
EDU 11: exhibited in previous work on manual tree annotation .
EDU 12: on the other hand , our grammars are much more compact and substantially more accurate than previous work on automatic annotation .
EDU 13: despite its simplicity ,
EDU 14: our best grammar achieves an f0 of 00.0 % on the penn treebank ,
EDU 15: higher than fully lexicalized systems .
EDU 0:
EDU 1: partial cognates are pairs of words in two languages
EDU 2: that have the same meaning in some , but not all contexts .
EDU 3: detecting the actual meaning of a partial cognate in context can be useful for machine translation tools and for computer-assisted language learning tools .
EDU 4: in this paper we propose a supervised and a semisupervised method
EDU 5: to disambiguate partial cognates between two languages :
EDU 6: french and english .
EDU 7: the methods use only automatically-labeled data ;
EDU 8: therefore they can be applied for other pairs of languages as well .
EDU 9: we also show
EDU 10: that our methods perform well
EDU 11: when using corpora from different domains .
EDU 0:
EDU 1: this paper investigates conceptually and empirically the novel sense matching task ,
EDU 2: which requires to recognize
EDU 3: whether the senses of two synonymous words match in context .
EDU 4: we suggest direct approaches to the problem ,
EDU 5: which avoid the intermediate step of explicit word sense disambiguation ,
EDU 6: and demonstrate their appealing advantages and stimulating potential for future research .
EDU 0:
EDU 1: this paper presents a new approach
EDU 2: based on equivalent pseudowords ( eps )
EDU 3: to tackle word sense disambiguation ( wsd ) in chinese language .
EDU 4: eps are particular artificial ambiguous words ,
EDU 5: which can be used
EDU 6: to realize unsupervised wsd .
EDU 7: a bayesian classifier is implemented
EDU 8: to test the efficacy of the ep solution on senseval-0 chinese test set .
EDU 9: the performance is better than state-of-the-art results with an average f-measure of 0.00 .
EDU 10: the experiment verifies the value of ep for unsupervised wsd .
EDU 0:
EDU 1: this paper presents techniques
EDU 2: to apply semi-crfs to named entity recognition tasks with a tractable computational cost .
EDU 3: our framework can handle an ner task
EDU 4: that has long named entities and many labels
EDU 5: which increase the computational cost .
EDU 6: to reduce the computational cost ,
EDU 7: we propose two techniques :
EDU 8: the first is the use of feature forests ,
EDU 9: which enables us to pack feature-equivalent states ,
EDU 10: and the second is the introduction of a filtering process
EDU 11: which significantly reduces the number of candidate states .
EDU 12: this framework allows us to use a rich set of features
EDU 13: extracted from the chunk-based representation
EDU 14: that can capture informative characteristics of entities .
EDU 15: we also introduce a simple trick
EDU 16: to transfer information about distant entities
EDU 17: by embedding label information into non-entity labels .
EDU 18: experimental results show
EDU 19: that our model achieves an f-score of 00.00 % on the jnlpba 0000 shared task
EDU 20: without using any external resources or post-processing techniques .
EDU 0:
EDU 1: as natural language understanding research advances towards deeper knowledge modeling ,
EDU 2: the tasks become more and more complex :
EDU 3: we are interested in more nuanced word characteristics , more linguistic properties , deeper semantic and syntactic features .
EDU 4: one such example ,
EDU 5: explored in this article ,
EDU 6: is the mention detection and recognition task in the automatic content extraction project ,
EDU 7: with the goal of identifying named , nominal or pronominal references to real-world entities—mentions—
EDU 8: and labeling them with three types of information :
EDU 9: entity type , entity subtype and mention type .
EDU 10: in this article , we investigate three methods
EDU 11: of assigning these related tags
EDU 12: and compare them on several data sets .
EDU 13: a system
EDU 14: based on the methods
EDU 15: presented in this article
EDU 16: participated and ranked very competitively in the ace'00 evaluation .
EDU 0:
EDU 1: hidden markov models ( hmms ) are powerful statistical models
EDU 2: that have found successful applications in information extraction ( ie ) .
EDU 3: in current approaches
EDU 4: to applying hmms to ie ,
EDU 5: an hmm is used
EDU 6: to model text at the document level .
EDU 7: this modelling might cause undesired redundancy in extraction in the sense
EDU 8: that more than one filler is identified and extracted .
EDU 9: we propose to use hmms
EDU 10: to model text at the segment level ,
EDU 11: in which the extraction process consists of two steps :
EDU 12: a segment retrieval step
EDU 13: followed by an extraction step .
EDU 14: in order to retrieve extractionrelevant segments from documents ,
EDU 15: we introduce a method
EDU 16: to use hmms
EDU 17: to model and retrieve segments .
EDU 18: our experimental results show
EDU 19: that the resulting segment hmm ie system not only achieves near zero extraction redundancy ,
EDU 20: but also has better overall extraction performance than traditional document hmm ie systems .
EDU 0:
EDU 1: this paper presents a new web mining scheme for parallel data acquisition .
EDU 2: based on the document object model ( dom ) ,
EDU 3: a web page is represented as a dom tree .
EDU 4: then a dom tree alignment model is proposed
EDU 5: to identify the translationally equivalent texts and hyperlinks between two parallel dom trees .
EDU 6: by tracing the identified parallel hyperlinks ,
EDU 7: parallel web documents are recursively mined .
EDU 8: compared with previous mining schemes ,
EDU 9: the benchmarks show
EDU 10: that this new mining scheme improves the mining coverage ,
EDU 11: reduces mining bandwidth ,
EDU 12: and enhances the quality of mined parallel sentences .
EDU 0:
EDU 1: this paper describes the development of questionbank , a corpus of 0000 parseannotated questions for
EDU 2: ( i ) use in training parsers
EDU 3: employed in qa ,
EDU 4: and ( ii ) evaluation of question parsing .
EDU 5: we present a series of experiments
EDU 6: to investigate the effectiveness of questionbank as both an exclusive and supplementary training resource for a state-of-the-art parser
EDU 7: in parsing both question and non-question test sets .
EDU 8: we introduce a new method for recovering empty nodes and their antecedents
EDU 9: ( capturing long distance dependencies )
EDU 10: from parser output in cfg trees
EDU 11: using lfg f-structure reentrancies .
EDU 0:
EDU 1: we present an algorithm
EDU 2: which creates a german ccgbank
EDU 3: by translating the syntax graphs in the german tiger corpus into ccg derivation trees .
EDU 4: the resulting corpus contains 00,000 derivations ,
EDU 5: covering 00 % of all complete sentences in tiger .
EDU 6: lexicons
EDU 7: extracted from this corpus
EDU 8: contain correct lexical entries for 00 % of all known tokens in unseen text .
EDU 0:
EDU 1: for many years , statistical machine translation relied on generative models
EDU 2: to provide bilingual word alignments .
EDU 3: in 0000 , several independent efforts showed
EDU 4: that discriminative models could be used
EDU 5: to enhance or replace the standard generative approach .
EDU 6: building on this work ,
EDU 7: we demonstrate substantial improvement in word-alignment accuracy , partly though improved training methods ,
EDU 8: but predominantly through selection of more and better features .
EDU 9: our best model produces the lowest alignment error rate
EDU 10: yet reported on canadian hansards bilingual data .
EDU 0:
EDU 1: we propose a novel reordering model for phrase-based statistical machine translation ( smt )
EDU 2: that uses a maximum entropy ( maxent ) model
EDU 3: to predicate reorderings of neighbor blocks ( phrase pairs ) .
EDU 4: the model provides content-dependent , hierarchical phrasal reordering with generalization based on features
EDU 5: automatically learned from a real-world bitext .
EDU 6: we present an algorithm
EDU 7: to extract all reordering events of neighbor blocks from bilingual data .
EDU 8: in our experiments on chineseto-english translation , this maxent-based reordering model obtains significant improvements in bleu score on the nist mt-00 and iwslt-00 tasks .
EDU 0:
EDU 1: in this paper , we argue
EDU 2: that n-gram language models are not sufficient
EDU 3: to address word reordering
EDU 4: required for machine translation .
EDU 5: we propose a new distortion model
EDU 6: that can be used with existing phrase-based smt decoders
EDU 7: to address those n-gram language model limitations .
EDU 8: we present empirical results in arabic to english machine translation
EDU 9: that show statistically significant improvements
EDU 10: when our proposed model is used .
EDU 11: we also propose a novel metric
EDU 12: to measure word order similarity ( or difference ) between any pair of languages
EDU 13: based on word alignments .
EDU 0:
EDU 1: this paper presents a study on
EDU 2: if and how automatically extracted keywords can be used
EDU 3: to improve text categorization .
EDU 4: in summary we show
EDU 5: that a higher performance
EDU 6: — as measured by micro-averaged f-measure on a standard text categorization collection —
EDU 7: is achieved
EDU 8: when the full-text representation is combined with the automatically extracted keywords .
EDU 9: the combination is obtained
EDU 10: by giving higher weights to words in the full-texts
EDU 11: that are also extracted as keywords .
EDU 12: we also present results for experiments
EDU 13: in which the keywords are the only input to the categorizer ,
EDU 14: either represented as unigrams or intact .
EDU 15: of these two experiments , the unigrams have the best performance ,
EDU 16: although neither performs as well as headlines only .
EDU 0:
EDU 1: words and character-bigrams are both used as features in chinese text processing tasks ,
EDU 2: but no systematic comparison or analysis of their values as features for chinese text categorization has been reported heretofore .
EDU 3: we carry out here a full performance comparison between them by experiments on various document collections
EDU 4: ( including a manually word-segmented corpus as a golden standard ) ,
EDU 5: and a semi-quantitative analysis
EDU 6: to elucidate the characteristics of their behavior ;
EDU 7: and try to provide some preliminary clue for feature term choice
EDU 8: ( in most cases , character-bigrams are better than words )
EDU 9: and dimensionality setting in text categorization systems .
EDU 0:
EDU 1: cross-language text categorization is the task
EDU 2: of assigning semantic classes to documents
EDU 3: written in a target language ( e.g. english )
EDU 4: while the system is trained using labeled documents in a source language ( e.g. italian ) .
EDU 5: in this work we present many solutions
EDU 6: according to the availability of bilingual resources ,
EDU 7: and we show
EDU 8: that it is possible to deal with the problem
EDU 9: even when no such resources are accessible .
EDU 10: the core technique relies on the automatic acquisition of multilingual domain models from comparable corpora .
EDU 11: experiments show the effectiveness of our approach ,
EDU 12: providing a low cost solution for the cross language text categorization task .
EDU 13: in particular , when bilingual dictionaries are available
EDU 14: the performance of the categorization gets close to that of monolingual text categorization .
EDU 0:
EDU 1: recent developments in statistical modeling of various linguistic phenomena have shown
EDU 2: that additional features give consistent performance improvements .
EDU 3: quite often , improvements are limited by the number of features
EDU 4: a system is able to explore .
EDU 5: this paper describes a novel progressive training algorithm
EDU 6: that selects features from virtually unlimited feature spaces for conditional maximum entropy ( cme ) modeling .
EDU 7: experimental results in edit region identification demonstrate the benefits of the progressive feature selection ( pfs ) algorithm :
EDU 8: the pfs algorithm maintains the same accuracy performance as previous cme feature selection algorithms ( e.g. , zhou et al. , 0000 )
EDU 9: when the same feature spaces are used .
EDU 10: when additional features and their combinations are used ,
EDU 11: the pfs gives 00.00 % relative improvement over the previously reported best result in edit region identification on switchboard corpus ( kahn et al. , 0000 ) ,
EDU 12: which leads to a 00 % relative error reduction
EDU 13: in parsing the switchboard corpus
EDU 14: when gold edits are used as the upper bound .
EDU 0:
EDU 1: we first show
EDU 2: how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models
EDU 3: trained by em from unannotated examples ( klein and manning , 0000 ) .
EDU 4: next , by annealing the free parameter
EDU 5: that controls this bias ,
EDU 6: we achieve further improvements .
EDU 7: we then describe an alternative kind of structural bias , toward "broken" hypotheses
EDU 8: consisting of partial structures over segmented sentences ,
EDU 9: and show a similar pattern of improvement .
EDU 10: we relate this approach to contrastive estimation ( smith and eisner , 0000a ) ,
EDU 11: apply the latter to grammar induction in six languages ,
EDU 12: and show
EDU 13: that our new approach improves accuracy by 0-00 % ( absolute ) over ce ( and 0-00 % over em ) ,
EDU 14: achieving to our knowledge the best results on this task to date .
EDU 15: our method , structural annealing , is a general technique with broad applicability to hidden-structure discovery problems .
EDU 0:
EDU 1: short vowels and other diacritics are not part of written arabic scripts .
EDU 2: exceptions are made for important political and religious texts and in scripts for beginning students of arabic .
EDU 3: script without diacritics have considerable ambiguity
EDU 4: because many words with different diacritic patterns appear identical in a diacritic-less setting .
EDU 5: we propose in this paper a maximum entropy approach
EDU 6: for restoring diacritics in a document .
EDU 7: the approach can easily integrate and make effective use of diverse types of information ;
EDU 8: the model we propose integrates a wide array of lexical , segmentbased and part-of-speech tag features .
EDU 9: the combination of these feature types leads to a state-of-the-art diacritization model .
EDU 10: using a publicly available corpus ( ldc's arabic treebank part 0 ) ,
EDU 11: we achieve a diacritic error rate of 0.0 % , a segment error rate 0.0 % , and a word error rate of 00.0 % .
EDU 12: in case-ending-less setting , we obtain a diacritic error rate of 0.0 % , a segment error rate 0.0 % , and a word error rate of 0.0 % .
EDU 0:
EDU 1: general information retrieval systems are designed to serve all users
EDU 2: without considering individual needs .
EDU 3: in this paper , we propose a novel approach to personalized search .
EDU 4: it can , in a unified way , exploit and utilize implicit feedback information , such as query logs and immediately viewed documents .
EDU 5: moreover , our approach can implement result re-ranking and query expansion simultaneously and collaboratively .
EDU 6: based on this approach ,
EDU 7: we develop a client-side personalized web search agent pair ( personalized assistant for information retrieval ) ,
EDU 8: which supports both english and chinese .
EDU 9: our experiments on trec and htrdp collections clearly show
EDU 10: that the new approach is both effective and efficient .
EDU 0:
EDU 1: this paper explores the relationship between the translation quality and the retrieval effectiveness in machine translation ( mt ) based cross-language information retrieval ( clir ) .
EDU 2: to obtain mt systems of different translation quality ,
EDU 3: we degrade a rule-based mt system
EDU 4: by decreasing the size of the rule base and the size of the dictionary .
EDU 5: we use the degraded mt systems
EDU 6: to translate queries
EDU 7: and submit the translated queries
EDU 8: of varying quality to the ir system .
EDU 9: retrieval effectiveness is found to correlate highly with the translation quality of the queries .
EDU 10: we further analyze the factors
EDU 11: that affect the retrieval effectiveness .
EDU 12: title queries are found to be preferred in mt-based clir .
EDU 13: in addition , dictionary-based degradation is shown to have stronger impact than rule-based degradation in mt-based clir .
EDU 0:
EDU 1: the trend in information retrieval systems is from document to sub-document retrieval , such as sentences in a summarization system and words or phrases in question-answering system .
EDU 2: despite this trend ,
EDU 3: systems continue to model language at a document level
EDU 4: using the inverse document frequency ( idf ) .
EDU 5: in this paper , we compare and contrast idf with inverse sentence frequency ( isf ) and inverse term frequency ( itf ) .
EDU 6: a direct comparison reveals
EDU 7: that all language models are highly correlated ;
EDU 8: however , the average isf and itf values are 0.0 and 00.0 higher than idf .
EDU 9: all language models appeared to follow a power law distribution with a slope coefficient of 0.0 for documents and 0.0 for sentences and terms .
EDU 10: we conclude with an analysis of idf stability with respect to random , journal , and section partitions of the 000,000 full-text scientific articles in our experimental corpus .
EDU 0:
EDU 1: we present a novel translation model
EDU 2: based on tree-to-string alignment template ( tat )
EDU 3: which describes the alignment between a source parse tree and a target string .
EDU 4: a tat is capable of generating both terminals and non-terminals
EDU 5: and performing reordering at both low and high levels .
EDU 6: the model is linguistically syntaxbased
EDU 7: because tats are extracted automatically from word-aligned , source side parsed parallel texts .
EDU 8: to translate a source sentence ,
EDU 9: we first employ a parser
EDU 10: to produce a source parse tree
EDU 11: and then apply tats
EDU 12: to transform the tree into a target string .
EDU 13: our experiments show
EDU 14: that the tat-based model significantly outperforms pharaoh , a state-of-the-art decoder for phrase-based models .
EDU 0:
EDU 1: this paper proposes a named entity recognition ( ner ) method for speech recognition results
EDU 2: that uses confidence on automatic speech recognition ( asr ) as a feature .
EDU 3: the asr confidence feature indicates
EDU 4: whether each word has been correctly recognized .
EDU 5: the ner model is trained
EDU 6: using asr results with named entity ( ne ) labels as well as the corresponding transcriptions with ne labels .
EDU 7: in experiments
EDU 8: using support vector machines ( svms ) and speech data from japanese newspaper articles ,
EDU 9: the proposed method outperformed a simple application of textbased ner to asr results in ner fmeasure
EDU 10: by improving precision .
EDU 11: these results show
EDU 12: that the proposed method is effective in ner for noisy inputs .
EDU 0:
EDU 1: we approach the zero-anaphora resolution problem
EDU 2: by decomposing it into intra-sentential and inter-sentential zeroanaphora resolution .
EDU 3: for the former problem , syntactic patterns of the appearance of zero-pronouns and their antecedents are useful clues .
EDU 4: taking japanese as a target language ,
EDU 5: we empirically demonstrate
EDU 6: that incorporating rich syntactic pattern features in a state-of-the-art learning-based anaphora resolution model dramatically improves the accuracy of intra-sentential zero-anaphora ,
EDU 7: which consequently improves the overall performance of zeroanaphora resolution .
EDU 0:
EDU 1: an automatic word spacing is one of the important tasks in korean language processing and information retrieval.
EDU 2: since there are a number of confusing cases in word spacing of korean ,
EDU 3: there are some mistakes in many texts
EDU 4: including news articles .
EDU 5: this paper presents a high-accurate method for automatic word spacing
EDU 6: based on self-organizing -gram model .
EDU 7: this method is basically a variant of -gram model ,
EDU 8: but achieves high accuracy
EDU 9: by automatically adapting context size .
EDU 10: in order to find the optimal context size ,
EDU 11: the proposed method automatically increases the context size
EDU 12: when the contextual distribution after increasing
EDU 13: it dose not agree with that of the current context .
EDU 14: it also decreases the context size
EDU 15: when the distribution of reduced context is similar to that of the current context .
EDU 16: this approach achieves high accuracy
EDU 17: by considering higher dimensional data in case of necessity ,
EDU 18: and the increased computational cost are compensated by the reduced context size .
EDU 19: the experimental results show
EDU 20: that the self-organizing structure of -gram model enhances the basic model .
EDU 0:
EDU 1: due to the historical and cultural reasons ,
EDU 2: english phases , especially the proper nouns and new words , frequently appear in web pages
EDU 3: written primarily in asian languages such as chinese and korean .
EDU 4: although these english terms and their equivalences in the asian languages refer to the same concept ,
EDU 5: they are erroneously treated as independent index units in traditional information retrieval ( ir ) .
EDU 6: this paper describes the degree
EDU 7: to which the problem arises in ir
EDU 8: and suggests a novel technique
EDU 9: to solve it .
EDU 10: our method firstly extracts an english phrase from asian language web pages ,
EDU 11: and then unifies the extracted phrase and its equivalence ( s ) in the language as one index unit .
EDU 12: experimental results show
EDU 13: that the high precision of our conceptual unification approach greatly improves the ir performance .
EDU 0:
EDU 1: word alignment
EDU 2: using recency-vector based approach
EDU 3: has recently become popular .
EDU 4: one major advantage of these techniques is that
EDU 5: unlike other approaches
EDU 6: they perform well
EDU 7: even if the size of the parallel corpora is small .
EDU 8: this makes these algorithms worth-studying for languages
EDU 9: where resources are scarce .
EDU 10: in this work we studied the performance of two very popular recency-vector based approaches ,
EDU 11: proposed in ( fung and mckeown , 0000 ) and ( somers , 0000 ) , respectively , for word alignment in english-hindi parallel corpus .
EDU 12: but performance of the above algorithms was not found to be satisfactory .
EDU 13: however , subsequent addition of some new constraints improved the performance of the recency-vector based alignment technique significantly for the said corpus .
EDU 14: the present paper discusses the new version of the algorithm and its performance in detail .
EDU 0:
EDU 1: this paper proposes methods
EDU 2: for extracting loanwords from cyrillic mongolian corpora
EDU 3: and producing a japanese-mongolian bilingual dictionary .
EDU 4: we extract loanwords from mongolian corpora
EDU 5: using our own handcrafted rules .
EDU 6: to complement the rule-based extraction ,
EDU 7: we also extract words in mongolian corpora
EDU 8: that are phonetically similar to japanese katakana words as loanwords .
EDU 9: in addition , we correspond the extracted loanwords to japanese words
EDU 10: and produce a bilingual dictionary .
EDU 11: we propose a stemming method for mongolian
EDU 12: to extract loanwords correctly .
EDU 13: we verify the effectiveness of our methods experimentally .
EDU 0:
EDU 1: morphological disambiguation is the process
EDU 2: of assigning one set of morphological features to each individual word in a text .
EDU 3: when the word is ambiguous
EDU 4: ( there are several possible analyses for the word ) ,
EDU 5: a disambiguation procedure
EDU 6: based on the word context
EDU 7: must be applied .
EDU 8: this paper deals with morphological disambiguation of the hebrew language ,
EDU 9: which combines morphemes into a word in both agglutinative and fusional ways .
EDU 10: we present an unsupervised stochastic model -
EDU 11: the only resource we use is a morphological analyzer -
EDU 12: which deals with the data sparseness problem
EDU 13: caused by the affixational morphology of the hebrew language .
EDU 14: we present a text encoding method for languages with affixational morphology
EDU 15: in which the knowledge of word formation rules
EDU 16: ( which are quite restricted in hebrew )
EDU 17: helps in the disambiguation .
EDU 18: we adapt hmm algorithms
EDU 19: for learning and searching this text representation ,
EDU 20: in such a way that segmentation and tagging can be learned in parallel in one step .
EDU 21: results on a large scale evaluation indicate
EDU 22: that this learning improves disambiguation for complex tag sets .
EDU 23: our method is applicable to other languages with affix morphology .
EDU 0:
EDU 1: developing better methods for segmenting continuous text into words is important
EDU 2: for improving the processing of asian languages ,
EDU 3: and may shed light on how humans learn to segment speech .
EDU 4: we propose two new bayesian word segmentation methods
EDU 5: that assume unigram and bigram models of word dependencies respectively .
EDU 6: the bigram model greatly outperforms the unigram model ( and previous probabilistic models ) ,
EDU 7: demonstrating the importance of such dependencies for word segmentation .
EDU 8: we also show
EDU 9: that previous probabilistic models rely crucially on suboptimal search procedures .
EDU 0:
EDU 1: we present magead , a morphological analyzer and generator for the arabic language family .
EDU 2: our work is novel
EDU 3: in that it explicitly addresses the need
EDU 4: for processing the morphology of the dialects .
EDU 5: magead performs an on-line analysis to or generation from a root+pattern+features representation ,
EDU 6: it has separate phonological and orthographic representations ,
EDU 7: and it allows for combining morphemes from different dialects .
EDU 8: we present a detailed evaluation of magead .
EDU 0:
EDU 1: we present a method for noun phrase chunking in hebrew .
EDU 2: we show
EDU 3: that the traditional definition of base-nps as nonrecursive noun phrases does not apply in hebrew ,
EDU 4: and propose an alternative definition of simple nps .
EDU 5: we review syntactic properties of hebrew
EDU 6: related to noun phrases ,
EDU 7: which indicate
EDU 8: that the task of hebrew simplenp chunking is harder than base-np chunking in english .
EDU 9: as a confirmation , we apply methods
EDU 10: known to work well for english to hebrew data .
EDU 11: these methods give low results ( f from 00 to 00 ) in hebrew .
EDU 12: we then discuss our method ,
EDU 13: which applies svm induction over lexical and morphological features .
EDU 14: morphological features improve the average precision by ~0.0 % , recall by ~0 % , and f-measure by ~0.00 ,
EDU 15: resulting in a system with average performance of 00 % precision , 00.0 % recall and 00.0 fmeasure .
EDU 0:
EDU 1: with performance above 00 % accuracy for newspaper text ,
EDU 2: part of speech ( pos ) tagging might be considered a solved problem .
EDU 3: previous studies have shown
EDU 4: that allowing the parser to resolve pos tag ambiguity does not improve performance .
EDU 5: however , for grammar formalisms
EDU 6: which use more fine-grained grammatical categories ,
EDU 7: for example tag and ccg ,
EDU 8: tagging accuracy is much lower .
EDU 9: in fact , for these formalisms , premature ambiguity resolution makes parsing infeasible .
EDU 10: we describe a multi-tagging approach
EDU 11: which maintains a suitable level of lexical category ambiguity for accurate and efficient ccg parsing .
EDU 12: we extend this multitagging approach to the pos level
EDU 13: to overcome errors
EDU 14: introduced by automatically assigned pos tags .
EDU 15: although pos tagging accuracy seems high ,
EDU 16: maintaining some pos tag ambiguity in the language processing pipeline results in more accurate ccg supertagging .
EDU 0:
EDU 1: in this paper , we present a method
EDU 2: for guessing pos tags of unknown words
EDU 3: using local and global information .
EDU 4: although many existing methods use only local information
EDU 5: ( i.e. limited window size or intra-sentential features ) ,
EDU 6: global information ( extra-sentential features ) provides valuable clues
EDU 7: for predicting pos tags of unknown words .
EDU 8: we propose a probabilistic model for pos guessing of unknown words
EDU 9: using global information as well as local information ,
EDU 10: and estimate its parameters
EDU 11: using gibbs sampling .
EDU 12: we also attempt to apply the model to semisupervised learning ,
EDU 13: and conduct experiments on multiple corpora .
EDU 0:
EDU 1: in this paper , we present a novel global reordering model
EDU 2: that can be incorporated into standard phrase-based statistical machine translation .
EDU 3: unlike previous local reordering models
EDU 4: that emphasize the reordering of adjacent phrase pairs ( tillmann and zhang , 0000 ) ,
EDU 5: our model explicitly models the reordering of long distances
EDU 6: by directly estimating the parameters from the phrase alignments of bilingual training sentences .
EDU 7: in principle , the global phrase reordering model is conditioned on the source and target phrases
EDU 8: that are currently being translated ,
EDU 9: and the previously translated source and target phrases .
EDU 10: to cope with sparseness ,
EDU 11: we use n-best phrase alignments and bilingual phrase clustering ,
EDU 12: and investigate a variety of combinations of conditioning factors .
EDU 13: through experiments ,
EDU 14: we show ,
EDU 15: that the global reordering model significantly improves the translation accuracy of a standard japanese-english translation task .
EDU 0:
EDU 1: this paper presents a novel training algorithm for a linearly-scored block sequence translation model .
EDU 2: the key component is a new procedure
EDU 3: to directly optimize the global scoring function
EDU 4: used by a smt decoder .
EDU 5: no translation , language , or distortion model probabilities are used as in earlier work on smt .
EDU 6: therefore our method ,
EDU 7: which employs less domain specific knowledge ,
EDU 8: is both simpler and more extensible than previous approaches .
EDU 9: moreover , the training procedure treats the decoder as a black-box ,
EDU 10: and thus can be used
EDU 11: to optimize any decoding scheme .
EDU 12: the training algorithm is evaluated on a standard arabic-english translation task .
EDU 0:
EDU 1: the noisy channel model approach is successfully applied to various natural language processing tasks .
EDU 2: currently the main research focus of this approach is adaptation methods ,
EDU 3: how to capture characteristics of words and expressions in a target domain
EDU 4: given example sentences in that domain .
EDU 5: as a solution we describe a method
EDU 6: enlarging the vocabulary of a language model to an almost infinite size
EDU 7: and capturing their context information .
EDU 8: especially the new method is suitable for languages
EDU 9: in which words are not delimited by whitespace .
EDU 10: we applied our method to a phoneme-to-text transcription task in japanese
EDU 11: and reduced about 00 % of the errors in the results of an existing method .
EDU 0:
EDU 1: call centers handle customer queries from various domains
EDU 2: such as computer sales and support , mobile phones , car rental , etc. .
EDU 3: each such domain generally has a domain model
EDU 4: which is essential to handle customer complaints .
EDU 5: these models contain common problem categories , typical customer issues and their solutions , greeting styles .
EDU 6: currently these models are manually created over time .
EDU 7: towards this ,
EDU 8: we propose an unsupervised technique
EDU 9: to generate domain models automatically from call transcriptions .
EDU 10: we use a state of the art automatic speech recognition system
EDU 11: to transcribe the calls between agents and customers ,
EDU 12: which still results in high word error rates ( 00 % )
EDU 13: and show
EDU 14: that even from these noisy transcriptions of calls we can automatically build a domain model .
EDU 15: the domain model is comprised of primarily a topic taxonomy
EDU 16: where every node is characterized by topic ( s ) , typical questions-answers ( q&as ) , typical actions and call statistics .
EDU 17: we show
EDU 18: how such a domain model can be used for topic identification of unseen calls .
EDU 19: we also propose applications for aiding agents
EDU 20: while handling calls
EDU 21: and for agent monitoring
EDU 22: based on the domain model .
EDU 0:
EDU 1: the paper presents a new model for contextdependent interpretation of linguistic expressions about spatial proximity between objects in a natural scene .
EDU 2: the paper discusses novel psycholinguistic experimental data
EDU 3: that tests and verifies the model .
EDU 4: the model has been implemented ,
EDU 5: and enables a conversational robot
EDU 6: to identify objects in a scene through topological spatial relations
EDU 7: ( e.g. "x near y" ) .
EDU 8: the model can help motivate the choice between topological and projective prepositions .
EDU 0:
EDU 1: this paper investigates a machine learning approach for temporally ordering and anchoring events in natural language texts .
EDU 2: to address data sparseness ,
EDU 3: we used temporal reasoning as an oversampling method
EDU 4: to dramatically expand the amount of training data ,
EDU 5: resulting in predictive accuracy on link labeling as high as 00 %
EDU 6: using a maximum entropy classifier on human annotated data .
EDU 7: this method compared favorably against a series of increasingly sophisticated baselines
EDU 8: involving expansion of rules
EDU 9: derived from human intuitions .
EDU 0:
EDU 1: we present a perceptron-style discriminative approach to machine translation
EDU 2: in which large feature sets can be exploited .
EDU 3: unlike discriminative reranking approaches ,
EDU 4: our system can take advantage of learned features in all stages of decoding .
EDU 5: we first discuss several challenges to error-driven discriminative approaches .
EDU 6: in particular , we explore different ways of updating parameters
EDU 7: given a training example .
EDU 8: we find
EDU 9: that making frequent but smaller updates is preferable to making fewer but larger updates .
EDU 10: then , we discuss an array of features
EDU 11: and show both
EDU 12: how they quantitatively increase bleu score
EDU 13: and how they qualitatively interact on specific examples .
EDU 14: one particular feature
EDU 15: we investigate
EDU 16: is a novel way
EDU 17: to introduce learning into the initial phrase extraction process ,
EDU 18: which has previously been entirely heuristic .
EDU 0:
EDU 1: we introduce a semi-supervised approach
EDU 2: to training for statistical machine translation
EDU 3: that alternates the traditional expectation maximization step
EDU 4: that is applied on a large training corpus with a discriminative step
EDU 5: aimed at increasing word-alignment quality on a small , manually word-aligned sub-corpus .
EDU 6: we show
EDU 7: that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality .
EDU 0:
EDU 1: we present a hierarchical phrase-based statistical machine translation
EDU 2: in which a target sentence is efficiently generated in left-to-right order .
EDU 3: the model is a class of synchronous-cfg with a greibach normal form-like structure for the projected production rule :
EDU 4: the paired target-side of a production rule takes a phrase prefixed form .
EDU 5: the decoder for the targetnormalized form is based on an earlystyle top down parser on the source side .
EDU 6: the target-normalized form
EDU 7: coupled with our top down parser
EDU 8: implies a left-toright generation of translations
EDU 9: which enables us a straightforward integration with ngram language models .
EDU 10: our model was experimented on a japanese-to-english newswire translation task ,
EDU 11: and showed statistically significant performance improvements against a phrase-based translation system .
EDU 0:
EDU 1: in the past years , a number of lexical association measures have been studied
EDU 2: to help extract new scientific terminology or general-language collocations .
EDU 3: the implicit assumption of this research was that newly designed term measures
EDU 4: involving more sophisticated statistical criteria
EDU 5: would outperform simple counts of cooccurrence frequencies .
EDU 6: we here explicitly test this assumption .
EDU 7: by way of four qualitative criteria ,
EDU 8: we show
EDU 9: that purely statistics-based measures reveal virtually no difference
EDU 10: compared with frequency of occurrence counts ,
EDU 11: while linguistically more informed metrics do reveal such a marked difference .
EDU 0:
EDU 1: many algorithms have been developed
EDU 2: to harvest lexical semantic resources ,
EDU 3: however few have linked the mined knowledge into formal knowledge repositories .
EDU 4: in this paper , we propose two algorithms
EDU 5: for automatically ontologizing ( attaching ) semantic relations into wordnet .
EDU 6: we present an empirical evaluation on the task
EDU 7: of attaching partof and causation relations ,
EDU 8: showing an improvement on f-score over a baseline model .
EDU 0:
EDU 1: we propose a novel algorithm
EDU 2: for inducing semantic taxonomies .
EDU 3: previous algorithms for taxonomy induction have typically focused on independent classifiers
EDU 4: for discovering new single relationships
EDU 5: based on hand-constructed or automatically discovered textual patterns .
EDU 6: by contrast , our algorithm flexibly incorporates evidence from multiple classifiers over heterogenous relationships
EDU 7: to optimize the entire structure of the taxonomy ,
EDU 8: using knowledge of a word's coordinate terms
EDU 9: to help in determining its hypernyms , and vice versa .
EDU 10: we apply our algorithm on the problem of sense-disambiguated noun hyponym acquisition ,
EDU 11: where we combine the predictions of hypernym and coordinate term classifiers with the knowledge in a preexisting semantic taxonomy ( wordnet 0.0 ) .
EDU 12: we add 00,000 novel synsets to wordnet 0.0 at 00 % precision , a relative error reduction of 00 % over a non-joint algorithm
EDU 13: using the same component classifiers .
EDU 14: finally , we show
EDU 15: that a taxonomy
EDU 16: built using our algorithm
EDU 17: shows a 00 % relative f-score improvement over wordnet 0.0 on an independent testset of hypernym pairs .
EDU 0:
EDU 1: in a new approach to large-scale extraction of facts from unstructured text , distributional similarities become an integral part of both the iterative acquisition of high-coverage contextual extraction patterns , and the validation and ranking of candidate facts .
EDU 2: the evaluation measures the quality and coverage of facts
EDU 3: extracted from one hundred million web documents ,
EDU 4: starting from ten seed facts
EDU 5: and using no additional knowledge , lexicons or complex tools .
EDU 0:
EDU 1: named entity recognition ( ner ) is an important part of many natural language processing tasks .
EDU 2: current approaches often employ machine learning techniques
EDU 3: and require supervised data .
EDU 4: however , many languages lack such resources .
EDU 5: this paper presents an ( almost ) unsupervised learning algorithm for automatic discovery of named entities ( nes ) in a resource free language ,
EDU 6: given a bilingual corpora
EDU 7: in which it is weakly temporally aligned with a resource rich language .
EDU 8: nes have similar time distributions across such corpora ,
EDU 9: and often some of the tokens in a multi-word ne are transliterated .
EDU 10: we develop an algorithm
EDU 11: that exploits both observations iteratively .
EDU 12: the algorithm makes use of a new , frequency based , metric for time distributions and a resource free discriminative approach to transliteration .
EDU 13: seeded with a small number of transliteration pairs ,
EDU 14: our algorithm discovers multi-word nes ,
EDU 15: and takes advantage of a dictionary
EDU 16: ( if one exists )
EDU 17: to account for translated or partially translated nes .
EDU 18: we evaluate the algorithm on an english-russian corpus ,
EDU 19: and show high level of nes discovery in russian .
EDU 0:
EDU 1: this paper proposes a novel composite kernel for relation extraction .
EDU 2: the composite kernel consists of two individual kernels :
EDU 3: an entity kernel
EDU 4: that allows for entity-related features
EDU 5: and a convolution parse tree kernel
EDU 6: that models syntactic information of relation examples .
EDU 7: the motivation of our method is to fully utilize the nice properties of kernel methods
EDU 8: to explore diverse knowledge for relation extraction .
EDU 9: our study illustrates
EDU 10: that the composite kernel can effectively capture both flat and structured features
EDU 11: without the need for extensive feature engineering ,
EDU 12: and can also easily scale to include more features .
EDU 13: evaluation on the ace corpus shows
EDU 14: that our method outperforms the previous best-reported methods
EDU 15: and significantly outperforms previous two dependency tree kernels for relation extraction .
EDU 0:
EDU 1: in this paper , we present a method
EDU 2: that improves japanese dependency parsing
EDU 3: by using large-scale statistical information .
EDU 4: it takes into account two kinds of information
EDU 5: not considered in previous statistical ( machine learning based ) parsing methods :
EDU 6: information about dependency relations among the case elements of a verb , and information about co-occurrence relations between a verb and its case element .
EDU 7: this information can be collected from the results of automatic dependency parsing of large-scale corpora .
EDU 8: the results of an experiment
EDU 9: in which our method was used
EDU 10: to rerank the results
EDU 11: obtained using an existing machine learning based parsing method
EDU 12: showed
EDU 13: that our method can improve the accuracy of the results
EDU 14: obtained using the existing method .
EDU 0:
EDU 1: this paper presents a hybrid approach to question answering in the clinical domain
EDU 2: that combines techniques from summarization and information retrieval.
EDU 3: we tackle a frequently-occurring class of questions
EDU 4: that takes the form "what is the best drug treatment for x ? "
EDU 5: starting from an initial set of medline citations ,
EDU 6: our system first identifies the drugs under study .
EDU 7: abstracts are then clustered
EDU 8: using semantic classes from the umls ontology .
EDU 9: finally , a short extractive summary is generated for each abstract
EDU 10: to populate the clusters .
EDU 11: two evaluations
EDU 12: —a manual one
EDU 13: focused on short answers
EDU 14: and an automatic one
EDU 15: focused on the supporting abstracts—
EDU 16: demonstrate
EDU 17: that our system compares favorably to pubmed , the search system most widely used by physicians today .
EDU 0:
EDU 1: in this paper we investigate a novel method
EDU 2: to detect asymmetric entailment relations between verbs .
EDU 3: our starting point is the idea
EDU 4: that some point-wise verb selectional preferences carry relevant semantic information .
EDU 5: experiments
EDU 6: using wordnet as a gold standard
EDU 7: show promising results .
EDU 8: where applicable ,
EDU 9: our method ,
EDU 10: used in combination with other approaches ,
EDU 11: significantly increases the performance of entailment detection .
EDU 12: a combined approach
EDU 13: including our model
EDU 14: improves the aroc of 0 % absolute points with respect to standard models .
EDU 0:
EDU 1: in this paper we present
EDU 2: how the automatic extraction of events from text can be used
EDU 3: to both classify narrative texts according to plot quality
EDU 4: and produce advice in an interactive learning environment
EDU 5: intended to help students with story writing .
EDU 6: we focus on the story rewriting task ,
EDU 7: in which an exemplar story is read to the students
EDU 8: and the students rewrite the story in their own words .
EDU 9: the system automatically extracts events from the raw text ,
EDU 10: formalized as a sequence of temporally ordered predicate-arguments .
EDU 11: these events are given to a machine-learner
EDU 12: that produces a coarse-grained rating of the story .
EDU 13: the results of the machine-learner and the extracted events are then used
EDU 14: to generate fine-grained advice for the students .
EDU 0:
EDU 1: we investigate generalizations of the allsubtrees `` dop '' approach to unsupervised parsing .
EDU 2: unsupervised dop models assign all possible binary trees to a set of sentences
EDU 3: and next use ( a large random subset of ) all subtrees from these binary trees
EDU 4: to compute the most probable parse trees .
EDU 5: we will test both a relative frequency estimator for unsupervised dop and a maximum likelihood estimator
EDU 6: which is known to be statistically consistent .
EDU 7: we report state-ofthe-art results on english ( wsj ) , german ( negra ) and chinese ( ctb ) data .
EDU 8: to the best of our knowledge this is the first paper
EDU 9: which tests a maximum likelihood estimator for dop on the wall street journal ,
EDU 10: leading to the surprising result
EDU 11: that an unsupervised parsing model beats a widely used supervised model ( a treebank pcfg ) .
EDU 0:
EDU 1: the present work advances the accuracy and training speed of discriminative parsing .
EDU 2: our discriminative parsing method has no generative component ,
EDU 3: yet surpasses a generative baseline on constituent parsing ,
EDU 4: and does so with minimal linguistic cleverness .
EDU 5: our model can incorporate arbitrary features of the input and parse state ,
EDU 6: and performs feature selection incrementally over an exponential feature space
EDU 7: during training .
EDU 8: we demonstrate the flexibility of our approach
EDU 9: by testing it with several parsing strategies and various feature sets .
EDU 10: our implementation is freely available at : http : //nlp.cs.nyu.edu/parser/ .
EDU 0:
EDU 1: we investigate prototype-driven learning for primarily unsupervised grammar induction .
EDU 2: prior knowledge is specified declaratively ,
EDU 3: by providing a few canonical examples of each target phrase type .
EDU 4: this sparse prototype information is then propagated across a corpus
EDU 5: using distributional similarity features ,
EDU 6: which augment an otherwise standard pcfg model .
EDU 7: we show
EDU 8: that distributional features are effective at distinguishing bracket labels ,
EDU 9: but not determining bracket locations .
EDU 10: to improve the quality of the induced trees ,
EDU 11: we combine our pcfg induction with the ccm model of klein and manning ( 0000 ) ,
EDU 12: which has complementary stengths :
EDU 13: it identifies brackets
EDU 14: but does not label them .
EDU 15: using only a handful of prototypes ,
EDU 16: we show substantial improvements over naive pcfg induction for english and chinese grammar induction .
EDU 0:
EDU 1: in this paper , we explore correlation of dependency relation paths
EDU 2: to rank candidate answers in answer extraction .
EDU 3: using the correlation measure ,
EDU 4: we compare dependency relations of a candidate answer and mapped question phrases in sentence with the corresponding relations in question .
EDU 5: different from previous studies ,
EDU 6: we propose an approximate phrase mapping algorithm
EDU 7: and incorporate the mapping score into the correlation measure .
EDU 8: the correlations are further incorporated into a maximum entropy-based ranking model
EDU 9: which estimates path weights from training .
EDU 10: experimental results show
EDU 11: that our method significantly outperforms state-of-the-art syntactic relation-based methods by up to 00 % in mrr .
EDU 0:
EDU 1: this paper describes an algorithm
EDU 2: for propagating verb arguments along lexical chains
EDU 3: consisting of wordnet relations .
EDU 4: the algorithm creates verb argument structures
EDU 5: using verbnet syntactic patterns .
EDU 6: in order to increase the coverage ,
EDU 7: a larger set of verb senses were automatically associated with the existing patterns from verbnet .
EDU 8: the algorithm is used in an in-house question answering system
EDU 9: for re-ranking the set of candidate answers .
EDU 10: tests on factoid questions from trec 0000 indicate
EDU 11: that the algorithm improved the system performance by 0.0 % .
EDU 0:
EDU 1: work on the semantics of questions has argued
EDU 2: that the relation between a question and its answer ( s ) can be cast in terms of logical entailment .
EDU 3: in this paper , we demonstrate
EDU 4: how computational systems
EDU 5: designed to recognize textual entailment
EDU 6: can be used
EDU 7: to enhance the accuracy of current open-domain automatic question answering ( q/a ) systems .
EDU 8: in our experiments , we show
EDU 9: that when textual entailment information is used to
EDU 10: either filter
EDU 11: or rank answers
EDU 12: returned by a q/a system ,
EDU 13: accuracy can be increased by as much as 00 % overall .
EDU 0:
EDU 1: we present a new approach
EDU 2: for mapping natural language sentences to their formal meaning representations
EDU 3: using string kernel-based classifiers .
EDU 4: our system learns these classifiers for every production in the formal language grammar .
EDU 5: meaning representations for novel natural language sentences are obtained
EDU 6: by finding the most probable semantic parse
EDU 7: using these string classifiers .
EDU 8: our experiments on two realworld data sets show
EDU 9: that this approach compares favorably to other existing systems
EDU 10: and is particularly robust to noise .
EDU 0:
EDU 1: we investigate the unsupervised detection of semi-fixed cue phrases
EDU 2: such as "this paper proposes a novel approach ... " from unseen text ,
EDU 3: on the basis of only a handful of seed cue phrases with the desired semantics .
EDU 4: the problem ,
EDU 5: in contrast to bootstrapping approaches for question answering and information extraction ,
EDU 6: is that it is hard to find a constraining context for occurrences of semi-fixed cue phrases .
EDU 7: our method uses components of the cue phrase itself , rather than external context ,
EDU 8: to bootstrap .
EDU 9: it successfully excludes phrases
EDU 10: which are different
EDU 11: from the target semantics ,
EDU 12: but which look superficially similar .
EDU 13: the method achieves 00 % accuracy ,
EDU 14: outperforming standard bootstrapping approaches .
EDU 0:
EDU 1: this article describes a robust semantic parser
EDU 2: that uses a broad knowledge base
EDU 3: created by interconnecting three major resources :
EDU 4: framenet , verbnet and propbank .
EDU 5: the framenet corpus contains the examples
EDU 6: annotated with semantic roles
EDU 7: whereas the verbnet lexicon provides the knowledge about the syntactic behavior of the verbs .
EDU 8: we connect verbnet and framenet
EDU 9: by mapping the framenet frames to the verbnet intersective levin classes .
EDU 10: the propbank corpus ,
EDU 11: which is tightly connected to the verbnet lexicon ,
EDU 12: is used
EDU 13: to increase the verb coverage
EDU 14: and also to test the effectiveness of our approach .
EDU 15: the results indicate
EDU 16: that our model is an interesting step towards the design of more robust semantic parsers .
EDU 0:
EDU 1: this paper presents the particular use of "jibiki" ( papillon's web server development platform ) for the lexalp0 project .
EDU 2: lexalp's goal is to harmonise the terminology on spatial planning and sustainable development
EDU 3: used within the alpine convention0 ,
EDU 4: so that the member states are able to cooperate and communicate efficiently in the four official languages
EDU 5: ( french , german , italian and slovene ) .
EDU 6: to this purpose , lexalp uses the jibiki platform
EDU 7: to build a term bank for the contrastive analysis of the specialised terminology
EDU 8: used in six different national legal systems and four different languages .
EDU 9: in this paper we present
EDU 10: how a generic platform like jibiki can cope with a new kind of dictionary .
EDU 0:
EDU 1: thesauri and ontologies provide important value
EDU 2: in facilitating access to digital archives
EDU 3: by representing underlying principles of organization .
EDU 4: translation of such resources into multiple languages is an important component
EDU 5: for providing multilingual access .
EDU 6: however , the specificity of vocabulary terms in most ontologies precludes fully-automated machine translation
EDU 7: using general-domain lexical resources .
EDU 8: in this paper , we present an efficient process
EDU 9: for leveraging human translations
EDU 10: when constructing domain-specific lexical resources .
EDU 11: we evaluate the effectiveness of this process
EDU 12: by producing a probabilistic phrase dictionary
EDU 13: and translating a thesaurus of 00,000 concepts
EDU 14: used to catalogue a large archive of oral histories .
EDU 15: our experiments demonstrate a cost-effective technique for accurate machine translation of large ontologies .
EDU 0:
EDU 1: this paper focuses on the use of advanced techniques of text analysis as support for collocation extraction .
EDU 2: a hybrid system is presented
EDU 3: that combines statistical methods and multilingual parsing
EDU 4: for detecting accurate collocational information from english , french , spanish and italian corpora .
EDU 5: the advantage of relying on full parsing over
EDU 6: using a traditional window method
EDU 7: ( which ignores the syntactic information )
EDU 8: is first theoretically motivated ,
EDU 9: then empirically validated by a comparative evaluation experiment .
EDU 0:
EDU 1: statistical mt has made great progress in the last few years ,
EDU 2: but current translation models are weak on re-ordering and target language fluency .
EDU 3: syntactic approaches seek to remedy these problems .
EDU 4: in this paper , we take the framework
EDU 5: for acquiring multi-level syntactic translation rules of ( galley et al. , 0000 ) from aligned tree-string pairs ,
EDU 6: and present two main extensions of their approach :
EDU 7: first , instead of merely computing a single derivation
EDU 8: that minimally explains a sentence pair ,
EDU 9: we construct a large number of derivations
EDU 10: that include contextually richer rules ,
EDU 11: and account for multiple interpretations of unaligned words .
EDU 12: second , we propose probability estimates and a training procedure
EDU 13: for weighting these rules .
EDU 14: we contrast different approaches on real examples ,
EDU 15: show
EDU 16: that our estimates
EDU 17: based on multiple derivations favor phrasal re-orderings
EDU 18: that are linguistically better motivated ,
EDU 19: and establish
EDU 20: that our larger rules provide a 0.00 bleu point increase over minimal rules .
EDU 0:
EDU 1: certain distinctions
EDU 2: made in the lexicon of one language
EDU 3: may be redundant
EDU 4: when translating into another language .
EDU 5: we quantify redundancy among source types by the similarity of their distributions over target types .
EDU 6: we propose a language-independent framework
EDU 7: for minimising lexical redundancy
EDU 8: that can be optimised directly from parallel text .
EDU 9: optimisation of the source lexicon for a given target language is viewed as model selection over a set of cluster-based translation models .
EDU 10: redundant distinctions between types may exhibit monolingual regularities , for example , inflexion patterns .
EDU 11: we define a prior over model structure
EDU 12: using a markov random field
EDU 13: and learn features over sets of monolingual types
EDU 14: that are predictive of bilingual redundancy .
EDU 15: the prior makes model selection more robust
EDU 16: without the need for language-specific assumptions
EDU 17: regarding redundancy .
EDU 18: using these models in a phrase-based smt system ,
EDU 19: we show significant improvements in translation quality for certain language pairs .
EDU 0:
EDU 1: this paper describes a study of the patterns of translational equivalence
EDU 2: exhibited by a variety of bitexts .
EDU 3: the study found
EDU 4: that the complexity of these patterns in every bitext was higher
EDU 5: than suggested in the literature .
EDU 6: these findings shed new light on why " syntactic " constraints have not helped to improve statistical translation models ,
EDU 7: including finitestate phrase-based models , tree-to-string models , and tree-to-tree models .
EDU 8: the paper also presents evidence
EDU 9: that inversion transduction grammars cannot generate some translational equivalence relations ,
EDU 10: even in relatively simple real bitexts in syntactically similar languages with rigid word order .
EDU 11: instructions for replicating our experiments are at http : //nlp.cs.nyu.edu/genpar/acl00
EDU 0:
EDU 1: we propose a new hierarchical bayesian n-gram model of natural languages .
EDU 2: our model makes use of a generalization of the commonly used dirichlet distributions
EDU 3: called pitman-yor processes
EDU 4: which produce power-law distributions
EDU 5: more closely resembling those in natural languages .
EDU 6: we show
EDU 7: that an approximation to the hierarchical pitman-yor language model recovers the exact formulation of interpolated kneser-ney , one of the best smoothing methods for n-gram language models .
EDU 8: experiments verify
EDU 9: that our model gives cross entropy results
EDU 10: superior to interpolated kneser-ney
EDU 11: and comparable to modified kneser-ney .
EDU 0:
EDU 1: chatting is a popular communication media on the internet via icq , chat rooms , etc. .
EDU 2: chat language is different from natural language
EDU 3: due to its anomalous and dynamic natures ,
EDU 4: which renders conventional nlp tools inapplicable .
EDU 5: the dynamic problem is enormously troublesome
EDU 6: because it makes static chat language corpus outdated quickly
EDU 7: in representing contemporary chat language .
EDU 8: to address the dynamic problem ,
EDU 9: we propose the phonetic mapping models
EDU 10: to present mappings between chat terms and standard words via phonetic transcription , i.e. chinese pinyin in our case .
EDU 11: different from character mappings ,
EDU 12: the phonetic mappings can be constructed from available standard chinese corpus .
EDU 13: to perform the task of dynamic chat language term normalization ,
EDU 14: we extend the source channel model
EDU 15: by incorporating the phonetic mapping models .
EDU 16: experimental results show
EDU 17: that this method is effective and stable
EDU 18: in normalizing dynamic chat language terms .
EDU 0:
EDU 1: this paper presents a discriminative pruning method of n-gram language model for chinese word segmentation .
EDU 2: to reduce the size of the language model
EDU 3: that is used in a chinese word segmentation system ,
EDU 4: importance of each bigram is computed in terms of discriminative pruning criterion
EDU 5: that is related to the performance loss
EDU 6: caused by pruning the bigram .
EDU 7: then we propose a step-by-step growing algorithm
EDU 8: to build the language model of desired size .
EDU 9: experimental results show
EDU 10: that the discriminative pruning method leads to a much smaller model
EDU 11: compared with the model
EDU 12: pruned using the state-of-the-art method .
EDU 13: at the same chinese word segmentation f-measure , the number of bigrams in the model can be reduced by up to 00 % .
EDU 14: correlation between language model perplexity and word segmentation performance is also discussed .
EDU 0:
EDU 1: a web search with double checking model is proposed
EDU 2: to explore the web as a live corpus .
EDU 3: five association measures
EDU 4: including variants of dice , overlap ratio , jaccard , and cosine , as well as co-occurrence double check ( codc ) ,
EDU 5: are presented .
EDU 6: in the experiments on rubenstein-goodenough's benchmark data set , the codc measure achieves correlation coefficient 0.0000 ,
EDU 7: which competes with the performance ( 0.0000 ) of the model
EDU 8: using wordnet .
EDU 9: the experiments on link detection of named entities
EDU 10: using the strategies of direct association , association matrix and scalar association matrix
EDU 11: verify
EDU 12: that the double-check frequencies are reliable .
EDU 13: further study on named entity clustering shows
EDU 14: that the five measures are quite useful .
EDU 15: in particular , codc measure is very stable on wordword and name-name experiments .
EDU 16: the application of codc measure
EDU 17: to expand community chains for personal name disambiguation
EDU 18: achieves 0.00 % and 00.00 % increase
EDU 19: compared to the system without community expansion .
EDU 20: all the experiments illustrate
EDU 21: that the novel model of web search with double checking is feasible for mining associations from the web .
EDU 0:
EDU 1: this paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts .
EDU 2: prior to retrieval ,
EDU 3: all sentences are annotated with predicate argument structures and ontological identifiers
EDU 4: by applying a deep parser and a term recognizer .
EDU 5: during the run time ,
EDU 6: user requests are converted into queries of region algebra on these annotations .
EDU 7: structural matching with pre-computed semantic annotations establishes the accurate and efficient retrieval of relational concepts .
EDU 8: this framework was applied to a text retrieval system for medline .
EDU 9: experiments on the retrieval of biomedical correlations revealed
EDU 10: that the cost is sufficiently small for real-time applications
EDU 11: and that the retrieval precision is significantly improved .
EDU 0:
EDU 1: a query speller is crucial to search engine
EDU 2: in improving web search relevance .
EDU 3: this paper describes novel methods for use of distributional similarity
EDU 4: estimated from query logs
EDU 5: in learning improved query spelling correction models .
EDU 6: the key to our methods is the property of distributional similarity between two terms :
EDU 7: it is high between a frequently occurring misspelling and its correction , and low between two irrelevant terms only with similar spellings .
EDU 8: we present two models
EDU 9: that are able to take advantage of this property .
EDU 10: experimental results demonstrate
EDU 11: that the distributional similarity based models can significantly outperform their baseline systems in the web query spelling correction task .
EDU 0:
EDU 1: we present a novel pcfg-based architecture for robust probabilistic generation
EDU 2: based on wide-coverage lfg approximations ( cahill et al. , 0000 )
EDU 3: automatically extracted from treebanks ,
EDU 4: maximising the probability of a tree
EDU 5: given an f-structure .
EDU 6: we evaluate our approach
EDU 7: using stringbased evaluation .
EDU 8: we currently achieve coverage of 00.00 % , a bleu score of 0.0000 and string accuracy of 0.0000 on the penn-ii wsj section 00 sentences of length ≤00 .
EDU 0:
EDU 1: this paper presents an approach
EDU 2: to incrementally generating locative expressions .
EDU 3: it addresses the issue of combinatorial explosion inherent in the construction of relational context models
EDU 4: by : ( a ) contextually defining the set of objects in the context
EDU 5: that may function as a landmark ,
EDU 6: and ( b ) sequencing the order
EDU 7: in which spatial relations are considered
EDU 8: using a cognitively motivated hierarchy of relations , and visual and discourse salience .
EDU 0:
EDU 1: japanese case markers ,
EDU 2: which indicate the grammatical relation of the complement np to the predicate ,
EDU 3: often pose challenges to the generation of japanese text ,
EDU 4: be it done by a foreign language learner , or by a machine translation ( mt ) system .
EDU 5: in this paper , we describe the task
EDU 6: of predicting japanese case markers
EDU 7: and propose machine learning methods
EDU 8: for solving it in two settings :
EDU 9: ( i ) monolingual , when given information only from the japanese sentence ;
EDU 10: and ( ii ) bilingual , when also given information from a corresponding english source sentence in an mt context .
EDU 11: we formulate the task after the well-studied task of english semantic role labelling ,
EDU 12: and explore features from a syntactic dependency structure of the sentence .
EDU 13: for the monolingual task , we evaluated our models on the kyoto corpus
EDU 14: and achieved over 00 % accuracy
EDU 15: in assigning correct case markers for each phrase .
EDU 16: for the bilingual task , we achieved an accuracy of 00 % per phrase
EDU 17: using a bilingual dataset from a technical domain .
EDU 18: we show
EDU 19: that in both settings , features
EDU 20: that exploit dependency information ,
EDU 21: whether derived from gold-standard annotations
EDU 22: or automatically assigned ,
EDU 23: contribute significantly to the prediction of case markers .
EDU 0:
EDU 1: in this paper we investigate
EDU 2: how to automatically determine
EDU 3: if two document collections are written from different perspectives .
EDU 4: by perspectives we mean a point of view , for example , from the perspective of democrats or republicans .
EDU 5: we propose a test of different perspectives
EDU 6: based on distribution divergence between the statistical models of two collections .
EDU 7: experimental results show
EDU 8: that the test can successfully distinguish document collections of different perspectives from other types of collections .
EDU 0:
EDU 1: subjectivity and meaning are both important properties of language .
EDU 2: this paper explores their interaction ,
EDU 3: and brings empirical evidence in support of the hypotheses
EDU 4: that ( 0 ) subjectivity is a property
EDU 5: that can be associated with word senses ,
EDU 6: and ( 0 ) word sense disambiguation can directly benefit from subjectivity annotations .
EDU 0:
EDU 1: this paper demonstrates a conceptually simple but effective method
EDU 2: of increasing the accuracy of qa systems on factoid-style questions .
EDU 3: we define the notion of an inverted question ,
EDU 4: and show
EDU 5: that by requiring
EDU 6: that the answers to the original and inverted questions be mutually consistent ,
EDU 7: incorrect answers get demoted in confidence
EDU 8: and correct ones promoted .
EDU 9: additionally , we show
EDU 10: that lack of validation can be used
EDU 11: to assert no-answer ( nil ) conditions .
EDU 12: we demonstrate increases of performance on trec and other question-sets ,
EDU 13: and discuss the kinds of future activities
EDU 14: that can be particularly beneficial to approaches such as ours .
EDU 0:
EDU 1: statistical ranking methods based on centroid vector ( profile )
EDU 2: extracted from external knowledge
EDU 3: have become widely adopted in the top definitional qa systems in trec 0000 and 0000 .
EDU 4: in these approaches , terms in the centroid vector are treated as a bag of words
EDU 5: based on the independent assumption .
EDU 6: to relax this assumption ,
EDU 7: this paper proposes a novel language model-based answer reranking method
EDU 8: to improve the existing bag-ofwords model approach
EDU 9: by considering the dependence of the words in the centroid vector .
EDU 10: experiments have been conducted
EDU 11: to evaluate the different dependence models .
EDU 12: the results on the trec 0000 test set show
EDU 13: that the reranking approach with biterm language model , significantly outperforms the one with the bag-ofwords model and unigram language model by 00.0 % and 00.0 % respectively in f-measure .
EDU 0:
EDU 1: unification grammars are widely accepted as an expressive means
EDU 2: for describing the structure of natural languages .
EDU 3: in general , the recognition problem is undecidable for unification grammars .
EDU 4: even with restricted variants of the formalism , offline parsable grammars ,
EDU 5: the problem is computationally hard .
EDU 6: we present two natural constraints on unification grammars
EDU 7: which limit their expressivity .
EDU 8: we first show
EDU 9: that non-reentrant unification grammars generate exactly the class of contextfree languages .
EDU 10: we then relax the constraint
EDU 11: and show
EDU 12: that one-reentrant unification grammars generate exactly the class of tree-adjoining languages .
EDU 13: we thus relate the commonly used and linguistically motivated formalism of unification grammars to more restricted , computationally tractable classes of languages .
EDU 0:
EDU 1: this paper describes a minimal topology driven parsing algorithm for topological grammars
EDU 2: that synchronizes a rewriting grammar and a dependency grammar ,
EDU 3: obtaining two linguistically motivated syntactic structures .
EDU 4: the use of non-local slash and visitor features can be restricted
EDU 5: to obtain a cky type analysis in polynomial time .
EDU 6: german long distance phenomena illustrate the algorithm ,
EDU 7: bringing to the fore the procedural needs of the analyses of syntax-topology mismatches in constraint based approaches like for example hpsg .
EDU 0:
EDU 1: we propose widl-expressions as a flexible formalism
EDU 2: that facilitates the integration of a generic sentence realization system within end-to-end language processing applications .
EDU 3: widl-expressions represent compactly probability distributions over finite sets of candidate realizations ,
EDU 4: and have optimal algorithms for realization via interpolation with language model probability distributions .
EDU 5: we show the effectiveness of a widl-based nlg system in two sentence realization tasks :
EDU 6: automatic translation and headline generation .
EDU 0:
EDU 1: this paper presents a method
EDU 2: for adapting a language generator to the strengths and weaknesses of a synthetic voice ,
EDU 3: thereby improving the naturalness of synthetic speech in a spoken language dialogue system .
EDU 4: the method trains a discriminative reranker
EDU 5: to select paraphrases
EDU 6: that are predicted to sound natural
EDU 7: when synthesized .
EDU 8: the ranker is trained on realizer and synthesizer features in supervised fashion ,
EDU 9: using human judgements of synthetic voice quality on a sample of the paraphrases representative of the generator's capability .
EDU 10: results from a cross-validation study indicate
EDU 11: that discriminative paraphrase reranking can achieve substantial improvements in naturalness on average ,
EDU 12: ameliorating the problem of highly variable synthesis quality
EDU 13: typically encountered with today's unit selection synthesizers .
EDU 0:
EDU 1: this paper shows
EDU 2: that a simple two-stage approach
EDU 3: to handle non-local dependencies in named entity recognition ( ner )
EDU 4: can outperform existing approaches
EDU 5: that handle non-local dependencies ,
EDU 6: while being much more computationally efficient .
EDU 7: ner systems typically use sequence models for tractable inference ,
EDU 8: but this makes them unable to capture the long distance structure present in text .
EDU 9: we use a conditional random field ( crf ) based ner system
EDU 10: using local features
EDU 11: to make predictions
EDU 12: and then train another crf
EDU 13: which uses both local information and features
EDU 14: extracted from the output of the first crf .
EDU 15: using features
EDU 16: capturing non-local dependencies from the same document ,
EDU 17: our approach yields a 00.0 % relative error reduction on the f0 score , over state-of-theart ner systems
EDU 18: using local-information alone ,
EDU 19: when compared to the 0.0 % relative error reduction
EDU 20: offered by the best systems
EDU 21: that exploit non-local information .
EDU 22: our approach also makes it easy to incorporate non-local information from other documents in the test corpus ,
EDU 23: and this gives us a 00.0 % error reduction over ner systems
EDU 24: using local-information alone .
EDU 25: additionally , our running time for inference is just the inference time of two sequential crfs ,
EDU 26: which is much less than that of other more complicated approaches
EDU 27: that directly model the dependencies
EDU 28: and do approximate inference .
EDU 0:
EDU 1: this paper presents an adaptive learning framework for phonetic similarity modeling ( psm )
EDU 2: that supports the automatic construction of transliteration lexicons .
EDU 3: the learning algorithm starts with minimum prior knowledge about machine transliteration ,
EDU 4: and acquires knowledge iteratively from the web .
EDU 5: we study the active learning and the unsupervised learning strategies
EDU 6: that minimize human supervision in terms of data labeling .
EDU 7: the learning process refines the psm
EDU 8: and constructs a transliteration lexicon at the same time .
EDU 9: we evaluate the proposed psm and its learning algorithm through a series of systematic experiments ,
EDU 10: which show
EDU 11: that the proposed framework is reliably effective on two independent databases .
EDU 0:
EDU 1: machine transliteration is to transcribe a word
EDU 2: written in a script
EDU 3: with approximate phonetic equivalence in another language .
EDU 4: it is useful for machine translation , cross-lingual information retrieval , multilingual text and speech processing .
EDU 5: punjabi machine transliteration ( pmt ) is a special case of machine transliteration
EDU 6: and is a process of converting a word from shahmukhi
EDU 7: ( based on arabic script )
EDU 8: to gurmukhi ( derivation of landa , shardha and takri , old scripts of indian subcontinent ) , two scripts of punjabi , irrespective of the type of word .
EDU 9: the punjabi machine transliteration system uses transliteration rules ( character mappings and dependency rules ) for transliteration of shahmukhi words into gurmukhi .
EDU 10: the pmt system can transliterate every word
EDU 11: written in shahmukhi .
EDU 0:
EDU 1: this paper presents an approach for multilingual document clustering in comparable corpora .
EDU 2: the algorithm is of heuristic nature
EDU 3: and it uses as unique evidence
EDU 4: for clustering the identification of cognate named entities between both sides of the comparable corpora .
EDU 5: one of the main advantages of this approach is that it does not depend on bilingual or multilingual resources .
EDU 6: however , it depends on the possibility
EDU 7: of identifying cognate named entities between the languages
EDU 8: used in the corpus .
EDU 9: an additional advantage of the approach is that it does not need any information about the right number of clusters ;
EDU 10: the algorithm calculates it .
EDU 11: we have tested this approach with a comparable corpus of news
EDU 12: written in english and spanish .
EDU 13: in addition , we have compared the results with a system
EDU 14: which translates selected document features .
EDU 15: the obtained results are encouraging .
EDU 0:
EDU 1: this study aims at identifying
EDU 2: when an event
EDU 3: written in text
EDU 4: occurs .
EDU 5: in particular , we classify a sentence for an event into four time-slots ;
EDU 6: morning , daytime , evening , and night .
EDU 7: to realize our goal ,
EDU 8: we focus on expressions
EDU 9: associated with time-slot ( time-associated words ) .
EDU 10: however , listing up all the time-associated words is impractical ,
EDU 11: because there are numerous time-associated expressions .
EDU 12: we therefore use a semi-supervised learning method , the naive bayes classifier
EDU 13: backed up with the expectation maximization algorithm ,
EDU 14: in order to iteratively extract time-associated words
EDU 15: while improving the classifier .
EDU 16: we also propose to use support vector machines
EDU 17: to filter out noisy instances
EDU 18: that indicates no specific time period .
EDU 19: as a result of experiments , the proposed method achieved 0.000 of accuracy
EDU 20: and outperformed other methods .
EDU 0:
EDU 1: given a parallel corpus ,
EDU 2: semantic projection attempts to transfer semantic role annotations from one language to another ,
EDU 3: typically by exploiting word alignments .
EDU 4: in this paper , we present an improved method
EDU 5: for obtaining constituent alignments between parallel sentences
EDU 6: to guide the role projection task .
EDU 7: our extensions are twofold :
EDU 8: ( a ) we model constituent alignment as minimum weight edge covers in a bipartite graph ,
EDU 9: which allows us to find a globally optimal solution efficiently ;
EDU 10: ( b ) we propose tree pruning as a promising strategy
EDU 11: for reducing alignment noise .
EDU 12: experimental results on an english-german parallel corpus demonstrate improvements over state-of-the-art models .
EDU 0:
EDU 1: in this paper , we discuss
EDU 2: how to utilize the co-occurrence of answers
EDU 3: in building an automatic question answering system
EDU 4: that answers a series of questions on a specific topic in a batch mode .
EDU 5: experiments show
EDU 6: that the answers to the many of the questions in the series usually have a high degree of co-occurrence in relevant document passages .
EDU 7: this feature sometimes can't be easily utilized in an automatic qa system
EDU 8: which processes questions independently .
EDU 9: however it can be utilized in a qa system
EDU 10: that processes questions in a batch mode .
EDU 11: we have used our pervious trec qa system as baseline
EDU 12: and augmented it with new answer clustering and co-occurrence maximization components
EDU 13: to build the batch qa system .
EDU 14: the experiment results show
EDU 15: that the qa system
EDU 16: running under the batch mode
EDU 17: get significant performance improvement over our baseline trec qa system .
EDU 0:
EDU 1: recent studies show
EDU 2: that to achieve mastery of a topic by 00 % of the student population ,
EDU 3: some students need ten times more learning content
EDU 4: than is available in current curricula .
EDU 5: at issue is not just increased volume ,
EDU 6: but the need for a highly differentiated content
EDU 7: specialized to promote optimal learning for each unique learner .
EDU 8: to address this synthesis problem
EDU 9: we have developed a generative platform
EDU 10: capable of dynamically varying content
EDU 11: based on the individual student needs .
EDU 12: this approach recently achieved 00 % mastery of a key algebra concept even for primary school students in three state-wide challenges .
EDU 13: in this talk i will describe our work on extending the platform
EDU 14: to enable students to solve all word problems in high-school within their preferred context
EDU 15: ( e.g. sci-fi , medieval , harry potter ) ,
EDU 16: as well as to automatically generate adaptive learning progressions for reading comprehension curricula in middle school .
EDU 0:
EDU 1: recent studies show
EDU 2: that to achieve mastery of a topic by 00 % of the student population ,
EDU 3: some students need ten times more learning content
EDU 4: than is available in current curricula .
EDU 5: at issue is not just increased volume ,
EDU 6: but the need for a highly differentiated content
EDU 7: specialized to promote optimal learning for each unique learner .
EDU 8: to address this synthesis problem
EDU 9: we have developed a generative platform
EDU 10: capable of dynamically varying content
EDU 11: based on the individual student needs .
EDU 12: this approach recently achieved 00 % mastery of a key algebra concept even for primary school students in three state-wide challenges .
EDU 13: in this talk i will describe our work on extending the platform
EDU 14: to enable students to solve all word problems in high-school within their preferred context
EDU 15: ( e.g. sci-fi , medieval , harry potter ) ,
EDU 16: as well as to automatically generate adaptive learning progressions for reading comprehension curricula in middle school .
EDU 0:
EDU 1: recent studies show
EDU 2: that to achieve mastery of a topic by 00 % of the student population ,
EDU 3: some students need ten times more learning content
EDU 4: than is available in current curricula .
EDU 5: at issue is not just increased volume ,
EDU 6: but the need for a highly differentiated content
EDU 7: specialized to promote optimal learning for each unique learner .
EDU 8: to address this synthesis problem
EDU 9: we have developed a generative platform
EDU 10: capable of dynamically varying content
EDU 11: based on the individual student needs .
EDU 12: this approach recently achieved 00 % mastery of a key algebra concept even for primary school students in three state-wide challenges .
EDU 13: in this talk i will describe our work on extending the platform
EDU 14: to enable students to solve all word problems in high-school within their preferred context
EDU 15: ( e.g. sci-fi , medieval , harry potter ) ,
EDU 16: as well as to automatically generate adaptive learning progressions for reading comprehension curricula in middle school .
EDU 0:
EDU 1: we present a series of algorithms with theoretical guarantees
EDU 2: for learning accurate ensembles of several structured prediction rules
EDU 3: for which no prior knowledge is as-sumed .
EDU 4: this includes a number of randomized and deterministic algorithms
EDU 5: devised by converting on-line learning algorithms to batch ones ,
EDU 6: and a boosting-style algorithm applicable in the context of structured prediction with a large number of labels .
EDU 7: we also report the results of extensive experiments with these algorithms .
EDU 0:
EDU 1: we present a series of algorithms with theoretical guarantees
EDU 2: for learning accurate ensembles of several structured prediction rules
EDU 3: for which no prior knowledge is as-sumed .
EDU 4: this includes a number of randomized and deterministic algorithms
EDU 5: devised by converting on-line learning algorithms to batch ones ,
EDU 6: and a boosting-style algorithm applicable in the context of structured prediction with a large number of labels .
EDU 7: we also report the results of extensive experiments with these algorithms .
EDU 0:
EDU 1: we present a series of algorithms with theoretical guarantees
EDU 2: for learning accurate ensembles of several structured prediction rules
EDU 3: for which no prior knowledge is as-sumed .
EDU 4: this includes a number of randomized and deterministic algorithms
EDU 5: devised by converting on-line learning algorithms to batch ones ,
EDU 6: and a boosting-style algorithm applicable in the context of structured prediction with a large number of labels .
EDU 7: we also report the results of extensive experiments with these algorithms .
EDU 0:
EDU 1: text-level discourse parsing is notoriously difficult ,
EDU 2: as distinctions between discourse relations require subtle semantic judgments
EDU 3: that are not easily captured
EDU 4: using standard features .
EDU 5: in this paper , we present a representation learning approach ,
EDU 6: in which we transform surface features into a latent space
EDU 7: that facilitates rst discourse parsing .
EDU 8: by combining the machinery of large-margin transition-based structured prediction with representation learning ,
EDU 9: our method jointly learns to parse discourse
EDU 10: while at the same time learning a discourse-driven projection of surface features .
EDU 11: the resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art
EDU 12: in predicting relations and nuclearity on the rst treebank .
EDU 0:
EDU 1: text-level discourse parsing is notoriously difficult ,
EDU 2: as distinctions between discourse relations require subtle semantic judgments
EDU 3: that are not easily captured
EDU 4: using standard features .
EDU 5: in this paper , we present a representation learning approach ,
EDU 6: in which we transform surface features into a latent space
EDU 7: that facilitates rst discourse parsing .
EDU 8: by combining the machinery of large-margin transition-based structured prediction with representation learning ,
EDU 9: our method jointly learns to parse discourse
EDU 10: while at the same time learning a discourse-driven projection of surface features .
EDU 11: the resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art
EDU 12: in predicting relations and nuclearity on the rst treebank .
EDU 0:
EDU 1: text-level discourse parsing is notoriously difficult ,
EDU 2: as distinctions between discourse relations require subtle semantic judgments
EDU 3: that are not easily captured
EDU 4: using standard features .
EDU 5: in this paper , we present a representation learning approach ,
EDU 6: in which we transform surface features into a latent space
EDU 7: that facilitates rst discourse parsing .
EDU 8: by combining the machinery of large-margin transition-based structured prediction with representation learning ,
EDU 9: our method jointly learns to parse discourse
EDU 10: while at the same time learning a discourse-driven projection of surface features .
EDU 11: the resulting shift-reduce discourse parser obtains substantial improvements over the previous state-of-the-art
EDU 12: in predicting relations and nuclearity on the rst treebank .
EDU 0:
EDU 1: previous researches on text-level discourse parsing mainly made use of constituency structure
EDU 2: to parse the whole document into one discourse tree .
EDU 3: in this paper , we present the limitations of constituency based discourse parsing
EDU 4: and first propose to use dependency structure
EDU 5: to directly represent the relations between elementary discourse units ( edus ) .
EDU 6: the state-of-the-art dependency parsing techniques ,
EDU 7: the eisner algorithm and maximum spanning tree ( mst ) algorithm ,
EDU 8: are adopted
EDU 9: to parse an optimal discourse dependency tree
EDU 10: based on the arc-factored model and the large-margin learning techniques .
EDU 11: experiments show
EDU 12: that our discourse dependency parsers achieve a competitive performance on text-level discourse parsing .
EDU 0:
EDU 1: previous researches on text-level discourse parsing mainly made use of constituency structure
EDU 2: to parse the whole document into one discourse tree .
EDU 3: in this paper , we present the limitations of constituency based discourse parsing
EDU 4: and first propose to use dependency structure
EDU 5: to directly represent the relations between elementary discourse units ( edus ) .
EDU 6: the state-of-the-art dependency parsing techniques ,
EDU 7: the eisner algorithm and maximum spanning tree ( mst ) algorithm ,
EDU 8: are adopted
EDU 9: to parse an optimal discourse dependency tree
EDU 10: based on the arc-factored model and the large-margin learning techniques .
EDU 11: experiments show
EDU 12: that our discourse dependency parsers achieve a competitive performance on text-level discourse parsing .
EDU 0:
EDU 1: previous researches on text-level discourse parsing mainly made use of constituency structure
EDU 2: to parse the whole document into one discourse tree .
EDU 3: in this paper , we present the limitations of constituency based discourse parsing
EDU 4: and first propose to use dependency structure
EDU 5: to directly represent the relations between elementary discourse units ( edus ) .
EDU 6: the state-of-the-art dependency parsing techniques ,
EDU 7: the eisner algorithm and maximum spanning tree ( mst ) algorithm ,
EDU 8: are adopted
EDU 9: to parse an optimal discourse dependency tree
EDU 10: based on the arc-factored model and the large-margin learning techniques .
EDU 11: experiments show
EDU 12: that our discourse dependency parsers achieve a competitive performance on text-level discourse parsing .
EDU 0:
EDU 1: a key challenge for computational conversation models is to discover latent structure in task-oriented dialogue ,
EDU 2: since it provides a basis
EDU 3: for analysing , evaluating , and building conversational systems .
EDU 4: we propose three new unsupervised models
EDU 5: to discover latent structures in task-oriented dialogues .
EDU 6: our methods synthesize hidden markov models ( for underlying state )
EDU 7: and topic models
EDU 8: ( to connect words to states ) .
EDU 9: we apply them to two real , non-trivial datasets :
EDU 10: human-computer spoken dialogues in bus query service , and human-human text-based chats from a live technical support service .
EDU 11: we show
EDU 12: that our models extract meaningful state representations and dialogue structures
EDU 13: consistent with human annotations .
EDU 14: quantitatively , we show
EDU 15: our models achieve superior performance on held-out log likelihood evaluation and an ordering task .
EDU 0:
EDU 1: a key challenge for computational conversation models is to discover latent structure in task-oriented dialogue ,
EDU 2: since it provides a basis
EDU 3: for analysing , evaluating , and building conversational systems .
EDU 4: we propose three new unsupervised models
EDU 5: to discover latent structures in task-oriented dialogues .
EDU 6: our methods synthesize hidden markov models ( for underlying state )
EDU 7: and topic models
EDU 8: ( to connect words to states ) .
EDU 9: we apply them to two real , non-trivial datasets :
EDU 10: human-computer spoken dialogues in bus query service , and human-human text-based chats from a live technical support service .
EDU 11: we show
EDU 12: that our models extract meaningful state representations and dialogue structures
EDU 13: consistent with human annotations .
EDU 14: quantitatively , we show
EDU 15: our models achieve superior performance on held-out log likelihood evaluation and an ordering task .
EDU 0:
EDU 1: a key challenge for computational conversation models is to discover latent structure in task-oriented dialogue ,
EDU 2: since it provides a basis
EDU 3: for analysing , evaluating , and building conversational systems .
EDU 4: we propose three new unsupervised models
EDU 5: to discover latent structures in task-oriented dialogues .
EDU 6: our methods synthesize hidden markov models ( for underlying state )
EDU 7: and topic models
EDU 8: ( to connect words to states ) .
EDU 9: we apply them to two real , non-trivial datasets :
EDU 10: human-computer spoken dialogues in bus query service , and human-human text-based chats from a live technical support service .
EDU 11: we show
EDU 12: that our models extract meaningful state representations and dialogue structures
EDU 13: consistent with human annotations .
EDU 14: quantitatively , we show
EDU 15: our models achieve superior performance on held-out log likelihood evaluation and an ordering task .
EDU 0:
EDU 1: we investigate different ways of learning structured perceptron models for coreference resolution
EDU 2: when using non-local features and beam search .
EDU 3: our experimental results indicate
EDU 4: that standard techniques such as early updates or learning as search optimization ( laso ) perform worse than a greedy baseline
EDU 5: that only uses local features .
EDU 6: by modifying laso to delay updates until the end of each instance
EDU 7: we obtain significant improvements over the baseline .
EDU 8: our model obtains the best results to date on recent shared task data for arabic , chinese , and english .
EDU 0:
EDU 1: we investigate different ways of learning structured perceptron models for coreference resolution
EDU 2: when using non-local features and beam search .
EDU 3: our experimental results indicate
EDU 4: that standard techniques such as early updates or learning as search optimization ( laso ) perform worse than a greedy baseline
EDU 5: that only uses local features .
EDU 6: by modifying laso to delay updates until the end of each instance
EDU 7: we obtain significant improvements over the baseline .
EDU 8: our model obtains the best results to date on recent shared task data for arabic , chinese , and english .
EDU 0:
EDU 1: we investigate different ways of learning structured perceptron models for coreference resolution
EDU 2: when using non-local features and beam search .
EDU 3: our experimental results indicate
EDU 4: that standard techniques such as early updates or learning as search optimization ( laso ) perform worse than a greedy baseline
EDU 5: that only uses local features .
EDU 6: by modifying laso to delay updates until the end of each instance
EDU 7: we obtain significant improvements over the baseline .
EDU 8: our model obtains the best results to date on recent shared task data for arabic , chinese , and english .
EDU 0:
EDU 1: we present a novel technique
EDU 2: for learning semantic representations ,
EDU 3: which extends the distributional hypothesis to multilingual data and joint-space embeddings .
EDU 4: our models leverage parallel data
EDU 5: and learn to strongly align the embeddings of semantically equivalent sentences ,
EDU 6: while maintaining sufficient distance between those of dissimilar sentences .
EDU 7: the models do not rely on word alignments or any syntactic information
EDU 8: and are successfully applied to a number of diverse languages .
EDU 9: we extend our approach to learn semantic representations at the document level , too .
EDU 10: we evaluate these models on two cross-lingual document classification tasks ,
EDU 11: outperforming the prior state of the art .
EDU 12: through qualitative analysis and the study of pivoting effects
EDU 13: we demonstrate
EDU 14: that our representations are semantically plausible
EDU 15: and can capture semantic relationships across languages
EDU 16: without parallel data .
EDU 0:
EDU 1: we present a novel technique
EDU 2: for learning semantic representations ,
EDU 3: which extends the distributional hypothesis to multilingual data and joint-space embeddings .
EDU 4: our models leverage parallel data
EDU 5: and learn to strongly align the embeddings of semantically equivalent sentences ,
EDU 6: while maintaining sufficient distance between those of dissimilar sentences .
EDU 7: the models do not rely on word alignments or any syntactic information
EDU 8: and are successfully applied to a number of diverse languages .
EDU 9: we extend our approach to learn semantic representations at the document level , too .
EDU 10: we evaluate these models on two cross-lingual document classification tasks ,
EDU 11: outperforming the prior state of the art .
EDU 12: through qualitative analysis and the study of pivoting effects
EDU 13: we demonstrate
EDU 14: that our representations are semantically plausible
EDU 15: and can capture semantic relationships across languages
EDU 16: without parallel data .
EDU 0:
EDU 1: we present a novel technique
EDU 2: for learning semantic representations ,
EDU 3: which extends the distributional hypothesis to multilingual data and joint-space embeddings .
EDU 4: our models leverage parallel data
EDU 5: and learn to strongly align the embeddings of semantically equivalent sentences ,
EDU 6: while maintaining sufficient distance between those of dissimilar sentences .
EDU 7: the models do not rely on word alignments or any syntactic information
EDU 8: and are successfully applied to a number of diverse languages .
EDU 9: we extend our approach to learn semantic representations at the document level , too .
EDU 10: we evaluate these models on two cross-lingual document classification tasks ,
EDU 11: outperforming the prior state of the art .
EDU 12: through qualitative analysis and the study of pivoting effects
EDU 13: we demonstrate
EDU 14: that our representations are semantically plausible
EDU 15: and can capture semantic relationships across languages
EDU 16: without parallel data .
EDU 0:
EDU 1: in this work , we revisit shared task 0 from the 0000 * sem conference :
EDU 2: the automated analysis of negation .
EDU 3: unlike the vast majority of participating systems in 0000 ,
EDU 4: our approach works over explicit and formal representations of propositional semantics ,
EDU 5: i.e. derives the notion of negation scope
EDU 6: assumed in this task from the structure of logical-form meaning representations .
EDU 7: we relate the task-specific interpretation of ( negation ) scope to the concept of ( quantifier and operator ) scope in mainstream underspecified semantics .
EDU 8: with reference to an explicit encoding of semantic predicate-argument structure , we can operationalize the annotation decisions
EDU 9: made for the 0000 * sem task ,
EDU 10: and demonstrate
EDU 11: how a comparatively simple system for negation scope resolution can be built from an off-the-shelf deep parsing system .
EDU 12: in a system combination setting , our approach improves over the best published results on this task to date .
EDU 0:
EDU 1: in this work , we revisit shared task 0 from the 0000 * sem conference :
EDU 2: the automated analysis of negation .
EDU 3: unlike the vast majority of participating systems in 0000 ,
EDU 4: our approach works over explicit and formal representations of propositional semantics ,
EDU 5: i.e. derives the notion of negation scope
EDU 6: assumed in this task from the structure of logical-form meaning representations .
EDU 7: we relate the task-specific interpretation of ( negation ) scope to the concept of ( quantifier and operator ) scope in mainstream underspecified semantics .
EDU 8: with reference to an explicit encoding of semantic predicate-argument structure , we can operationalize the annotation decisions
EDU 9: made for the 0000 * sem task ,
EDU 10: and demonstrate
EDU 11: how a comparatively simple system for negation scope resolution can be built from an off-the-shelf deep parsing system .
EDU 12: in a system combination setting , our approach improves over the best published results on this task to date .
EDU 0:
EDU 1: in this work , we revisit shared task 0 from the 0000 * sem conference :
EDU 2: the automated analysis of negation .
EDU 3: unlike the vast majority of participating systems in 0000 ,
EDU 4: our approach works over explicit and formal representations of propositional semantics ,
EDU 5: i.e. derives the notion of negation scope
EDU 6: assumed in this task from the structure of logical-form meaning representations .
EDU 7: we relate the task-specific interpretation of ( negation ) scope to the concept of ( quantifier and operator ) scope in mainstream underspecified semantics .
EDU 8: with reference to an explicit encoding of semantic predicate-argument structure , we can operationalize the annotation decisions
EDU 9: made for the 0000 * sem task ,
EDU 10: and demonstrate
EDU 11: how a comparatively simple system for negation scope resolution can be built from an off-the-shelf deep parsing system .
EDU 12: in a system combination setting , our approach improves over the best published results on this task to date .
EDU 0:
EDU 1: dependency-based compositional semantics (dcs ) is a framework of natural language semantics with easy-to-process structures as well as strict semantics .
EDU 2: in this paper , we equip the dcs framework with logical inference ,
EDU 3: by defining abstract denotations as an abstraction of the computing process of denotations in original dcs .
EDU 4: an inference engine is built
EDU 5: to achieve inference on abstract denotations .
EDU 6: furthermore , we propose a way
EDU 7: to generate on-the-fly knowledge in logical inference ,
EDU 8: by combining our framework with the idea of tree transformation .
EDU 9: experiments on fracas and pascal rte datasets show promising results .
EDU 0:
EDU 1: dependency-based compositional semantics (dcs ) is a framework of natural language semantics with easy-to-process structures as well as strict semantics .
EDU 2: in this paper , we equip the dcs framework with logical inference ,
EDU 3: by defining abstract denotations as an abstraction of the computing process of denotations in original dcs .
EDU 4: an inference engine is built
EDU 5: to achieve inference on abstract denotations .
EDU 6: furthermore , we propose a way
EDU 7: to generate on-the-fly knowledge in logical inference ,
EDU 8: by combining our framework with the idea of tree transformation .
EDU 9: experiments on fracas and pascal rte datasets show promising results .
EDU 0:
EDU 1: dependency-based compositional semantics (dcs ) is a framework of natural language semantics with easy-to-process structures as well as strict semantics .
EDU 2: in this paper , we equip the dcs framework with logical inference ,
EDU 3: by defining abstract denotations as an abstraction of the computing process of denotations in original dcs .
EDU 4: an inference engine is built
EDU 5: to achieve inference on abstract denotations .
EDU 6: furthermore , we propose a way
EDU 7: to generate on-the-fly knowledge in logical inference ,
EDU 8: by combining our framework with the idea of tree transformation .
EDU 9: experiments on fracas and pascal rte datasets show promising results .
EDU 0:
EDU 1: distributional semantic methods
EDU 2: to approximate word meaning with context vectors
EDU 3: have been very successful empirically ,
EDU 4: and the last years have seen a surge of interest in their compositional extension to phrases and sentences .
EDU 5: we present here a new model
EDU 6: that , like those of coecke et al ( 0000 ) and baroni and zamparelli ( 0000 ) ,
EDU 7: closely mimics the standard montagovian semantic treatment of composition in distributional terms .
EDU 8: however , our approach avoids a number of issues
EDU 9: that have prevented the application of the earlier linguistically-motivated models to full-fledged , real-life sentences .
EDU 10: we test the model on a variety of empirical tasks ,
EDU 11: showing
EDU 12: that it consistently outperforms a set of competitive rivals .
EDU 0:
EDU 1: distributional semantic methods
EDU 2: to approximate word meaning with context vectors
EDU 3: have been very successful empirically ,
EDU 4: and the last years have seen a surge of interest in their compositional extension to phrases and sentences .
EDU 5: we present here a new model
EDU 6: that , like those of coecke et al ( 0000 ) and baroni and zamparelli ( 0000 ) ,
EDU 7: closely mimics the standard montagovian semantic treatment of composition in distributional terms .
EDU 8: however , our approach avoids a number of issues
EDU 9: that have prevented the application of the earlier linguistically-motivated models to full-fledged , real-life sentences .
EDU 10: we test the model on a variety of empirical tasks ,
EDU 11: showing
EDU 12: that it consistently outperforms a set of competitive rivals .
EDU 0:
EDU 1: distributional semantic methods
EDU 2: to approximate word meaning with context vectors
EDU 3: have been very successful empirically ,
EDU 4: and the last years have seen a surge of interest in their compositional extension to phrases and sentences .
EDU 5: we present here a new model
EDU 6: that , like those of coecke et al ( 0000 ) and baroni and zamparelli ( 0000 ) ,
EDU 7: closely mimics the standard montagovian semantic treatment of composition in distributional terms .
EDU 8: however , our approach avoids a number of issues
EDU 9: that have prevented the application of the earlier linguistically-motivated models to full-fledged , real-life sentences .
EDU 10: we test the model on a variety of empirical tasks ,
EDU 11: showing
EDU 12: that it consistently outperforms a set of competitive rivals .
EDU 0:
EDU 1: morphological segmentation is an effective sparsity reduction strategy for statistical machine translation ( smt )
EDU 2: involving morphologically complex languages .
EDU 3: when translating into a segmented language ,
EDU 4: an extra step is required
EDU 5: to desegment the output ;
EDU 6: previous studies have desegmented the 0-best output from the decoder .
EDU 7: in this paper , we expand our translation options
EDU 8: by desegmenting n-best lists or lattices .
EDU 9: our novel lattice desegmentation algorithm effectively combines both segmented and desegmented views of the target language for a large subspace of possible translation outputs ,
EDU 10: which allows for inclusion of features
EDU 11: related to the desegmentation process ,
EDU 12: as well as an unsegmented language model ( lm ) .
EDU 13: we investigate this technique in the context of english-to-arabic and english-to-finnish translation ,
EDU 14: showing significant improvements in translation quality over desegmentation of 0-best decoder outputs .
EDU 0:
EDU 1: morphological segmentation is an effective sparsity reduction strategy for statistical machine translation ( smt )
EDU 2: involving morphologically complex languages .
EDU 3: when translating into a segmented language ,
EDU 4: an extra step is required
EDU 5: to desegment the output ;
EDU 6: previous studies have desegmented the 0-best output from the decoder .
EDU 7: in this paper , we expand our translation options
EDU 8: by desegmenting n-best lists or lattices .
EDU 9: our novel lattice desegmentation algorithm effectively combines both segmented and desegmented views of the target language for a large subspace of possible translation outputs ,
EDU 10: which allows for inclusion of features
EDU 11: related to the desegmentation process ,
EDU 12: as well as an unsegmented language model ( lm ) .
EDU 13: we investigate this technique in the context of english-to-arabic and english-to-finnish translation ,
EDU 14: showing significant improvements in translation quality over desegmentation of 0-best decoder outputs .
EDU 0:
EDU 1: morphological segmentation is an effective sparsity reduction strategy for statistical machine translation ( smt )
EDU 2: involving morphologically complex languages .
EDU 3: when translating into a segmented language ,
EDU 4: an extra step is required
EDU 5: to desegment the output ;
EDU 6: previous studies have desegmented the 0-best output from the decoder .
EDU 7: in this paper , we expand our translation options
EDU 8: by desegmenting n-best lists or lattices .
EDU 9: our novel lattice desegmentation algorithm effectively combines both segmented and desegmented views of the target language for a large subspace of possible translation outputs ,
EDU 10: which allows for inclusion of features
EDU 11: related to the desegmentation process ,
EDU 12: as well as an unsegmented language model ( lm ) .
EDU 13: we investigate this technique in the context of english-to-arabic and english-to-finnish translation ,
EDU 14: showing significant improvements in translation quality over desegmentation of 0-best decoder outputs .
EDU 0:
EDU 1: we propose bilingually-constrained recursive auto-encoders ( brae )
EDU 2: to learn semantic phrase embeddings
EDU 3: ( compact vector representations for phrases ) ,
EDU 4: which can distinguish the phrases with different semantic meanings .
EDU 5: the brae is trained in a way
EDU 6: that minimizes the semantic distance of translation equivalents
EDU 7: and maximizes the semantic distance of non-translation pairs simultaneously .
EDU 8: after training ,
EDU 9: the model learns how to embed each phrase semantically in two languages
EDU 10: and also learns how to transform semantic embedding space in one language to the other .
EDU 11: we evaluate our proposed method on two end-to-end smt tasks
EDU 12: ( phrase table pruning and decoding with phrasal semantic similarities )
EDU 13: which need to measure semantic similarity between a source phrase and its translation candidates .
EDU 14: extensive experiments show
EDU 15: that the brae is remarkably effective in these two tasks .
EDU 0:
EDU 1: we propose bilingually-constrained recursive auto-encoders ( brae )
EDU 2: to learn semantic phrase embeddings
EDU 3: ( compact vector representations for phrases ) ,
EDU 4: which can distinguish the phrases with different semantic meanings .
EDU 5: the brae is trained in a way
EDU 6: that minimizes the semantic distance of translation equivalents
EDU 7: and maximizes the semantic distance of non-translation pairs simultaneously .
EDU 8: after training ,
EDU 9: the model learns how to embed each phrase semantically in two languages
EDU 10: and also learns how to transform semantic embedding space in one language to the other .
EDU 11: we evaluate our proposed method on two end-to-end smt tasks
EDU 12: ( phrase table pruning and decoding with phrasal semantic similarities )
EDU 13: which need to measure semantic similarity between a source phrase and its translation candidates .
EDU 14: extensive experiments show
EDU 15: that the brae is remarkably effective in these two tasks .
EDU 0:
EDU 1: we propose bilingually-constrained recursive auto-encoders ( brae )
EDU 2: to learn semantic phrase embeddings
EDU 3: ( compact vector representations for phrases ) ,
EDU 4: which can distinguish the phrases with different semantic meanings .
EDU 5: the brae is trained in a way
EDU 6: that minimizes the semantic distance of translation equivalents
EDU 7: and maximizes the semantic distance of non-translation pairs simultaneously .
EDU 8: after training ,
EDU 9: the model learns how to embed each phrase semantically in two languages
EDU 10: and also learns how to transform semantic embedding space in one language to the other .
EDU 11: we evaluate our proposed method on two end-to-end smt tasks
EDU 12: ( phrase table pruning and decoding with phrasal semantic similarities )
EDU 13: which need to measure semantic similarity between a source phrase and its translation candidates .
EDU 14: extensive experiments show
EDU 15: that the brae is remarkably effective in these two tasks .
EDU 0:
EDU 1: in this paper , instead of designing new features
EDU 2: based on intuition , linguistic knowledge and domain ,
EDU 3: we learn some new and effective features
EDU 4: using the deep autoencoder ( dae ) paradigm for phrase-based translation model .
EDU 5: using the unsupervised pre-trained deep belief net ( dbn )
EDU 6: to initialize dae's parameters
EDU 7: and using the input original phrase features as a teacher for semi-supervised fine-tuning ,
EDU 8: we learn new semi-supervised dae features ,
EDU 9: which are more effective and stable than the unsupervised dbn features .
EDU 10: moreover , to learn high dimensional feature representation ,
EDU 11: we introduce a natural horizontal composition of more daes for large hidden layers feature learning .
EDU 12: on two chinese-english tasks , our semi-supervised dae features obtain statistically significant improvements of 0.00/0.00 ( iwslt ) and 0.00/0.00 ( nist ) bleu points over the unsupervised dbn features and the baseline features , respectively .
EDU 0:
EDU 1: in this paper , instead of designing new features
EDU 2: based on intuition , linguistic knowledge and domain ,
EDU 3: we learn some new and effective features
EDU 4: using the deep autoencoder ( dae ) paradigm for phrase-based translation model .
EDU 5: using the unsupervised pre-trained deep belief net ( dbn )
EDU 6: to initialize dae's parameters
EDU 7: and using the input original phrase features as a teacher for semi-supervised fine-tuning ,
EDU 8: we learn new semi-supervised dae features ,
EDU 9: which are more effective and stable than the unsupervised dbn features .
EDU 10: moreover , to learn high dimensional feature representation ,
EDU 11: we introduce a natural horizontal composition of more daes for large hidden layers feature learning .
EDU 12: on two chinese-english tasks , our semi-supervised dae features obtain statistically significant improvements of 0.00/0.00 ( iwslt ) and 0.00/0.00 ( nist ) bleu points over the unsupervised dbn features and the baseline features , respectively .
EDU 0:
EDU 1: in this paper , instead of designing new features
EDU 2: based on intuition , linguistic knowledge and domain ,
EDU 3: we learn some new and effective features
EDU 4: using the deep autoencoder ( dae ) paradigm for phrase-based translation model .
EDU 5: using the unsupervised pre-trained deep belief net ( dbn )
EDU 6: to initialize dae's parameters
EDU 7: and using the input original phrase features as a teacher for semi-supervised fine-tuning ,
EDU 8: we learn new semi-supervised dae features ,
EDU 9: which are more effective and stable than the unsupervised dbn features .
EDU 10: moreover , to learn high dimensional feature representation ,
EDU 11: we introduce a natural horizontal composition of more daes for large hidden layers feature learning .
EDU 12: on two chinese-english tasks , our semi-supervised dae features obtain statistically significant improvements of 0.00/0.00 ( iwslt ) and 0.00/0.00 ( nist ) bleu points over the unsupervised dbn features and the baseline features , respectively .
EDU 0:
EDU 1: statistical machine translation ( smt ) usually utilizes contextual information
EDU 2: to disambiguate translation candidates .
EDU 3: however , it is often limited to contexts within sentence boundaries ,
EDU 4: hence broader topical information cannot be leveraged .
EDU 5: in this paper , we propose a novel approach
EDU 6: to learning topic representation for parallel data
EDU 7: using a neural network architecture ,
EDU 8: where abundant topical contexts are embedded
EDU 9: via topic relevant monolingual data .
EDU 10: by associating each translation rule with the topic representation ,
EDU 11: topic relevant rules are selected
EDU 12: according to the distributional similarity with the source text during smt decoding .
EDU 13: experimental results show
EDU 14: that our method significantly improves translation accuracy in the nist chinese-to-english translation task
EDU 15: compared to a state-of-the-art baseline .
EDU 0:
EDU 1: statistical machine translation ( smt ) usually utilizes contextual information
EDU 2: to disambiguate translation candidates .
EDU 3: however , it is often limited to contexts within sentence boundaries ,
EDU 4: hence broader topical information cannot be leveraged .
EDU 5: in this paper , we propose a novel approach
EDU 6: to learning topic representation for parallel data
EDU 7: using a neural network architecture ,
EDU 8: where abundant topical contexts are embedded
EDU 9: via topic relevant monolingual data .
EDU 10: by associating each translation rule with the topic representation ,
EDU 11: topic relevant rules are selected
EDU 12: according to the distributional similarity with the source text during smt decoding .
EDU 13: experimental results show
EDU 14: that our method significantly improves translation accuracy in the nist chinese-to-english translation task
EDU 15: compared to a state-of-the-art baseline .
EDU 0:
EDU 1: statistical machine translation ( smt ) usually utilizes contextual information
EDU 2: to disambiguate translation candidates .
EDU 3: however , it is often limited to contexts within sentence boundaries ,
EDU 4: hence broader topical information cannot be leveraged .
EDU 5: in this paper , we propose a novel approach
EDU 6: to learning topic representation for parallel data
EDU 7: using a neural network architecture ,
EDU 8: where abundant topical contexts are embedded
EDU 9: via topic relevant monolingual data .
EDU 10: by associating each translation rule with the topic representation ,
EDU 11: topic relevant rules are selected
EDU 12: according to the distributional similarity with the source text during smt decoding .
EDU 13: experimental results show
EDU 14: that our method significantly improves translation accuracy in the nist chinese-to-english translation task
EDU 15: compared to a state-of-the-art baseline .
EDU 0:
EDU 1: in this paper , we address the problem of web-domain pos tagging
EDU 2: using a two-phase approach .
EDU 3: the first phase learns representations
EDU 4: that capture regularities
EDU 5: underlying web text .
EDU 6: the representation is integrated as features into a neural network
EDU 7: that serves as a scorer for an easy-first pos tagger .
EDU 8: parameters of the neural network are trained
EDU 9: using guided learning in the second phase .
EDU 10: experiment on the sancl 0000 shared task show
EDU 11: that our approach achieves 00.00 % average tagging accuracy ,
EDU 12: which is the best accuracy
EDU 13: reported so far on this data set , higher than those
EDU 14: given by ensembled syntactic parsers .
EDU 0:
EDU 1: in this paper , we address the problem of web-domain pos tagging
EDU 2: using a two-phase approach .
EDU 3: the first phase learns representations
EDU 4: that capture regularities
EDU 5: underlying web text .
EDU 6: the representation is integrated as features into a neural network
EDU 7: that serves as a scorer for an easy-first pos tagger .
EDU 8: parameters of the neural network are trained
EDU 9: using guided learning in the second phase .
EDU 10: experiment on the sancl 0000 shared task show
EDU 11: that our approach achieves 00.00 % average tagging accuracy ,
EDU 12: which is the best accuracy
EDU 13: reported so far on this data set , higher than those
EDU 14: given by ensembled syntactic parsers .
EDU 0:
EDU 1: in this paper , we address the problem of web-domain pos tagging
EDU 2: using a two-phase approach .
EDU 3: the first phase learns representations
EDU 4: that capture regularities
EDU 5: underlying web text .
EDU 6: the representation is integrated as features into a neural network
EDU 7: that serves as a scorer for an easy-first pos tagger .
EDU 8: parameters of the neural network are trained
EDU 9: using guided learning in the second phase .
EDU 10: experiment on the sancl 0000 shared task show
EDU 11: that our approach achieves 00.00 % average tagging accuracy ,
EDU 12: which is the best accuracy
EDU 13: reported so far on this data set , higher than those
EDU 14: given by ensembled syntactic parsers .
EDU 0:
EDU 1: discussion forums have evolved into a dependable source of knowledge
EDU 2: to solve common problems .
EDU 3: however , only a minority of the posts in discussion forums are solution posts .
EDU 4: identifying solution posts from discussion forums , hence , is an important research problem .
EDU 5: in this paper , we present a technique for unsupervised solution post identification
EDU 6: leveraging a so far unexplored textual feature , that of lexical correlations between problems and solutions .
EDU 7: we use translation models and language models
EDU 8: to exploit lexical correlations and solution post character respectively .
EDU 9: our technique is designed
EDU 10: to not rely much on structural features such as post metadata
EDU 11: since such features are often not uniformly available across forums .
EDU 12: our clustering-based iterative solution identification approach
EDU 13: based on the em-formulation
EDU 14: performs favorably in an empirical evaluation ,
EDU 15: beating the only unsupervised solution identification technique from literature by a very large margin .
EDU 16: we also show
EDU 17: that our unsupervised technique is competitive against methods
EDU 18: that require supervision ,
EDU 19: outperforming one such technique comfortably .
EDU 0:
EDU 1: discussion forums have evolved into a dependable source of knowledge
EDU 2: to solve common problems .
EDU 3: however , only a minority of the posts in discussion forums are solution posts .
EDU 4: identifying solution posts from discussion forums , hence , is an important research problem .
EDU 5: in this paper , we present a technique for unsupervised solution post identification
EDU 6: leveraging a so far unexplored textual feature , that of lexical correlations between problems and solutions .
EDU 7: we use translation models and language models
EDU 8: to exploit lexical correlations and solution post character respectively .
EDU 9: our technique is designed
EDU 10: to not rely much on structural features such as post metadata
EDU 11: since such features are often not uniformly available across forums .
EDU 12: our clustering-based iterative solution identification approach
EDU 13: based on the em-formulation
EDU 14: performs favorably in an empirical evaluation ,
EDU 15: beating the only unsupervised solution identification technique from literature by a very large margin .
EDU 16: we also show
EDU 17: that our unsupervised technique is competitive against methods
EDU 18: that require supervision ,
EDU 19: outperforming one such technique comfortably .
EDU 0:
EDU 1: discussion forums have evolved into a dependable source of knowledge
EDU 2: to solve common problems .
EDU 3: however , only a minority of the posts in discussion forums are solution posts .
EDU 4: identifying solution posts from discussion forums , hence , is an important research problem .
EDU 5: in this paper , we present a technique for unsupervised solution post identification
EDU 6: leveraging a so far unexplored textual feature , that of lexical correlations between problems and solutions .
EDU 7: we use translation models and language models
EDU 8: to exploit lexical correlations and solution post character respectively .
EDU 9: our technique is designed
EDU 10: to not rely much on structural features such as post metadata
EDU 11: since such features are often not uniformly available across forums .
EDU 12: our clustering-based iterative solution identification approach
EDU 13: based on the em-formulation
EDU 14: performs favorably in an empirical evaluation ,
EDU 15: beating the only unsupervised solution identification technique from literature by a very large margin .
EDU 16: we also show
EDU 17: that our unsupervised technique is competitive against methods
EDU 18: that require supervision ,
EDU 19: outperforming one such technique comfortably .
EDU 0:
EDU 1: while user attribute extraction on social media has received considerable attention ,
EDU 2: existing approaches , mostly supervised , encounter great difficulty
EDU 3: in obtaining gold standard data
EDU 4: and are therefore limited to predicting unary predicates
EDU 5: ( e.g. , gender ) .
EDU 6: in this paper , we present a weaklysupervised approach to user profile extraction from twitter .
EDU 7: users' profiles from social media websites such as facebook or google plus are used as a distant source of supervision for extraction of their attributes from user-generated text .
EDU 8: in addition to traditional linguistic features
EDU 9: used in distant supervision for information extraction ,
EDU 10: our approach also takes into account network information , a unique opportunity
EDU 11: offered by social media .
EDU 12: we test our algorithm on three attribute domains :
EDU 13: spouse , education and job ;
EDU 14: experimental results demonstrate
EDU 15: our approach is able to make accurate predictions for users' attributes
EDU 16: based on their tweets.
EDU 0:
EDU 1: while user attribute extraction on social media has received considerable attention ,
EDU 2: existing approaches , mostly supervised , encounter great difficulty
EDU 3: in obtaining gold standard data
EDU 4: and are therefore limited to predicting unary predicates
EDU 5: ( e.g. , gender ) .
EDU 6: in this paper , we present a weaklysupervised approach to user profile extraction from twitter .
EDU 7: users' profiles from social media websites such as facebook or google plus are used as a distant source of supervision for extraction of their attributes from user-generated text .
EDU 8: in addition to traditional linguistic features
EDU 9: used in distant supervision for information extraction ,
EDU 10: our approach also takes into account network information , a unique opportunity
EDU 11: offered by social media .
EDU 12: we test our algorithm on three attribute domains :
EDU 13: spouse , education and job ;
EDU 14: experimental results demonstrate
EDU 15: our approach is able to make accurate predictions for users' attributes
EDU 16: based on their tweets.
EDU 0:
EDU 1: while user attribute extraction on social media has received considerable attention ,
EDU 2: existing approaches , mostly supervised , encounter great difficulty
EDU 3: in obtaining gold standard data
EDU 4: and are therefore limited to predicting unary predicates
EDU 5: ( e.g. , gender ) .
EDU 6: in this paper , we present a weaklysupervised approach to user profile extraction from twitter .
EDU 7: users' profiles from social media websites such as facebook or google plus are used as a distant source of supervision for extraction of their attributes from user-generated text .
EDU 8: in addition to traditional linguistic features
EDU 9: used in distant supervision for information extraction ,
EDU 10: our approach also takes into account network information , a unique opportunity
EDU 11: offered by social media .
EDU 12: we test our algorithm on three attribute domains :
EDU 13: spouse , education and job ;
EDU 14: experimental results demonstrate
EDU 15: our approach is able to make accurate predictions for users' attributes
EDU 16: based on their tweets.
EDU 0:
EDU 1: consider a person
EDU 2: trying to spread an important message on a social network .
EDU 3: he/she can spend hours trying to craft the message .
EDU 4: does it actually matter ?
EDU 5: while there has been extensive prior work
EDU 6: looking into predicting popularity of social-media content ,
EDU 7: the effect of wording per se has rarely been studied
EDU 8: since it is often confounded with the popularity of the author and the topic .
EDU 9: to control for these confounding factors ,
EDU 10: we take advantage of the surprising fact
EDU 11: that there are many pairs of tweets
EDU 12: containing the same url
EDU 13: and written by the same user
EDU 14: but employing different wording .
EDU 15: given such pairs ,
EDU 16: we ask :
EDU 17: which version attracts more retweets ?
EDU 18: this turns out to be a more difficult task
EDU 19: than predicting popular topics .
EDU 20: still , humans can answer this question better than chance
EDU 21: ( but far from perfectly ) ,
EDU 22: and the computational methods
EDU 23: we develop
EDU 24: can do better than both an average human and a strong competing method
EDU 25: trained on non-controlled data .
EDU 0:
EDU 1: consider a person
EDU 2: trying to spread an important message on a social network .
EDU 3: he/she can spend hours trying to craft the message .
EDU 4: does it actually matter ?
EDU 5: while there has been extensive prior work
EDU 6: looking into predicting popularity of social-media content ,
EDU 7: the effect of wording per se has rarely been studied
EDU 8: since it is often confounded with the popularity of the author and the topic .
EDU 9: to control for these confounding factors ,
EDU 10: we take advantage of the surprising fact
EDU 11: that there are many pairs of tweets
EDU 12: containing the same url
EDU 13: and written by the same user
EDU 14: but employing different wording .
EDU 15: given such pairs ,
EDU 16: we ask :
EDU 17: which version attracts more retweets ?
EDU 18: this turns out to be a more difficult task
EDU 19: than predicting popular topics .
EDU 20: still , humans can answer this question better than chance
EDU 21: ( but far from perfectly ) ,
EDU 22: and the computational methods
EDU 23: we develop
EDU 24: can do better than both an average human and a strong competing method
EDU 25: trained on non-controlled data .
EDU 0:
EDU 1: consider a person
EDU 2: trying to spread an important message on a social network .
EDU 3: he/she can spend hours trying to craft the message .
EDU 4: does it actually matter ?
EDU 5: while there has been extensive prior work
EDU 6: looking into predicting popularity of social-media content ,
EDU 7: the effect of wording per se has rarely been studied
EDU 8: since it is often confounded with the popularity of the author and the topic .
EDU 9: to control for these confounding factors ,
EDU 10: we take advantage of the surprising fact
EDU 11: that there are many pairs of tweets
EDU 12: containing the same url
EDU 13: and written by the same user
EDU 14: but employing different wording .
EDU 15: given such pairs ,
EDU 16: we ask :
EDU 17: which version attracts more retweets ?
EDU 18: this turns out to be a more difficult task
EDU 19: than predicting popular topics .
EDU 20: still , humans can answer this question better than chance
EDU 21: ( but far from perfectly ) ,
EDU 22: and the computational methods
EDU 23: we develop
EDU 24: can do better than both an average human and a strong competing method
EDU 25: trained on non-controlled data .
EDU 0:
EDU 1: existing models for social media personal analytics assume access to thousands of messages per user ,
EDU 2: even though most users author content only sporadically over time .
EDU 3: given this sparsity ,
EDU 4: we : ( i ) leverage content from the local neighborhood of a user ;
EDU 5: ( ii ) evaluate batch models as a function of size and the amount of messages in various types of neighborhoods ;
EDU 6: and ( iii ) estimate the amount of time and tweets
EDU 7: required for a dynamic model
EDU 8: to predict user preferences .
EDU 9: we show
EDU 10: that even when limited or no self-authored data is available ,
EDU 11: language from friend , retweet and user mention communications provide sufficient evidence for prediction .
EDU 12: when updating models over time
EDU 13: based on twitter ,
EDU 14: we find
EDU 15: that political preference can be often be predicted
EDU 16: using roughly 000 tweets ,
EDU 17: depending on the context of user selection ,
EDU 18: where this could mean hours , or weeks ,
EDU 19: based on the author's tweeting frequency .
EDU 0:
EDU 1: existing models for social media personal analytics assume access to thousands of messages per user ,
EDU 2: even though most users author content only sporadically over time .
EDU 3: given this sparsity ,
EDU 4: we : ( i ) leverage content from the local neighborhood of a user ;
EDU 5: ( ii ) evaluate batch models as a function of size and the amount of messages in various types of neighborhoods ;
EDU 6: and ( iii ) estimate the amount of time and tweets
EDU 7: required for a dynamic model
EDU 8: to predict user preferences .
EDU 9: we show
EDU 10: that even when limited or no self-authored data is available ,
EDU 11: language from friend , retweet and user mention communications provide sufficient evidence for prediction .
EDU 12: when updating models over time
EDU 13: based on twitter ,
EDU 14: we find
EDU 15: that political preference can be often be predicted
EDU 16: using roughly 000 tweets ,
EDU 17: depending on the context of user selection ,
EDU 18: where this could mean hours , or weeks ,
EDU 19: based on the author's tweeting frequency .
EDU 0:
EDU 1: existing models for social media personal analytics assume access to thousands of messages per user ,
EDU 2: even though most users author content only sporadically over time .
EDU 3: given this sparsity ,
EDU 4: we : ( i ) leverage content from the local neighborhood of a user ;
EDU 5: ( ii ) evaluate batch models as a function of size and the amount of messages in various types of neighborhoods ;
EDU 6: and ( iii ) estimate the amount of time and tweets
EDU 7: required for a dynamic model
EDU 8: to predict user preferences .
EDU 9: we show
EDU 10: that even when limited or no self-authored data is available ,
EDU 11: language from friend , retweet and user mention communications provide sufficient evidence for prediction .
EDU 12: when updating models over time
EDU 13: based on twitter ,
EDU 14: we find
EDU 15: that political preference can be often be predicted
EDU 16: using roughly 000 tweets ,
EDU 17: depending on the context of user selection ,
EDU 18: where this could mean hours , or weeks ,
EDU 19: based on the author's tweeting frequency .
EDU 0:
EDU 1: much of the recent work on dependency parsing has been focused on solving inherent combinatorial problems
EDU 2: associated with rich scoring functions .
EDU 3: in contrast , we demonstrate
EDU 4: that highly expressive scoring functions can be used with substantially simpler inference procedures .
EDU 5: specifically , we introduce a sampling-based parser
EDU 6: that can easily handle arbitrary global features .
EDU 7: inspired by samplerank ,
EDU 8: we learn to take guided stochastic steps towards a high scoring parse .
EDU 9: we introduce two samplers
EDU 10: for traversing the space of trees , gibbs and metropolis-hastings with random walk .
EDU 11: the model outperforms state-of-the-art results
EDU 12: when evaluated on 00 languages of non-projective conll datasets .
EDU 13: our sampling-based approach naturally extends to joint prediction scenarios , such as joint parsing and pos correction .
EDU 14: the resulting method outperforms the best reported results on the catib dataset ,
EDU 15: approaching performance of parsing with gold tags.
EDU 0:
EDU 1: much of the recent work on dependency parsing has been focused on solving inherent combinatorial problems
EDU 2: associated with rich scoring functions .
EDU 3: in contrast , we demonstrate
EDU 4: that highly expressive scoring functions can be used with substantially simpler inference procedures .
EDU 5: specifically , we introduce a sampling-based parser
EDU 6: that can easily handle arbitrary global features .
EDU 7: inspired by samplerank ,
EDU 8: we learn to take guided stochastic steps towards a high scoring parse .
EDU 9: we introduce two samplers
EDU 10: for traversing the space of trees , gibbs and metropolis-hastings with random walk .
EDU 11: the model outperforms state-of-the-art results
EDU 12: when evaluated on 00 languages of non-projective conll datasets .
EDU 13: our sampling-based approach naturally extends to joint prediction scenarios , such as joint parsing and pos correction .
EDU 14: the resulting method outperforms the best reported results on the catib dataset ,
EDU 15: approaching performance of parsing with gold tags.
EDU 0:
EDU 1: much of the recent work on dependency parsing has been focused on solving inherent combinatorial problems
EDU 2: associated with rich scoring functions .
EDU 3: in contrast , we demonstrate
EDU 4: that highly expressive scoring functions can be used with substantially simpler inference procedures .
EDU 5: specifically , we introduce a sampling-based parser
EDU 6: that can easily handle arbitrary global features .
EDU 7: inspired by samplerank ,
EDU 8: we learn to take guided stochastic steps towards a high scoring parse .
EDU 9: we introduce two samplers
EDU 10: for traversing the space of trees , gibbs and metropolis-hastings with random walk .
EDU 11: the model outperforms state-of-the-art results
EDU 12: when evaluated on 00 languages of non-projective conll datasets .
EDU 13: our sampling-based approach naturally extends to joint prediction scenarios , such as joint parsing and pos correction .
EDU 14: the resulting method outperforms the best reported results on the catib dataset ,
EDU 15: approaching performance of parsing with gold tags.
EDU 0:
EDU 1: due to their origin in computer graphics ,
EDU 2: graphics processing units ( gpus ) are highly optimized for dense problems ,
EDU 3: where the exact same operation is applied repeatedly to all data points .
EDU 4: natural language processing algorithms , on the other hand , are traditionally constructed in ways
EDU 5: that exploit structural sparsity .
EDU 6: recently , canny et al ( 0000 ) presented an approach to gpu parsing
EDU 7: that sacrifices traditional sparsity in exchange for raw computational power ,
EDU 8: obtaining a system
EDU 9: that can compute viterbi parses for a high-quality grammar at about 000 sentences per second on a mid-range gpu .
EDU 10: in this work , we reintroduce sparsity to gpu parsing
EDU 11: by adapting a coarse-to-fine pruning approach to the constraints of a gpu .
EDU 12: the resulting system is capable of computing over 000 viterbi parses per second more than a 0x speedup on the same hardware .
EDU 13: moreover , our approach allows us to efficiently implement less gpu-friendly minimum bayes risk inference ,
EDU 14: improving throughput for this more accurate algorithm from only 00 sentences per second unpruned to over 000 sentences per second
EDU 15: using pruning nearly a 0x speedup .
EDU 0:
EDU 1: due to their origin in computer graphics ,
EDU 2: graphics processing units ( gpus ) are highly optimized for dense problems ,
EDU 3: where the exact same operation is applied repeatedly to all data points .
EDU 4: natural language processing algorithms , on the other hand , are traditionally constructed in ways
EDU 5: that exploit structural sparsity .
EDU 6: recently , canny et al ( 0000 ) presented an approach to gpu parsing
EDU 7: that sacrifices traditional sparsity in exchange for raw computational power ,
EDU 8: obtaining a system
EDU 9: that can compute viterbi parses for a high-quality grammar at about 000 sentences per second on a mid-range gpu .
EDU 10: in this work , we reintroduce sparsity to gpu parsing
EDU 11: by adapting a coarse-to-fine pruning approach to the constraints of a gpu .
EDU 12: the resulting system is capable of computing over 000 viterbi parses per second more than a 0x speedup on the same hardware .
EDU 13: moreover , our approach allows us to efficiently implement less gpu-friendly minimum bayes risk inference ,
EDU 14: improving throughput for this more accurate algorithm from only 00 sentences per second unpruned to over 000 sentences per second
EDU 15: using pruning nearly a 0x speedup .
EDU 0:
EDU 1: due to their origin in computer graphics ,
EDU 2: graphics processing units ( gpus ) are highly optimized for dense problems ,
EDU 3: where the exact same operation is applied repeatedly to all data points .
EDU 4: natural language processing algorithms , on the other hand , are traditionally constructed in ways
EDU 5: that exploit structural sparsity .
EDU 6: recently , canny et al ( 0000 ) presented an approach to gpu parsing
EDU 7: that sacrifices traditional sparsity in exchange for raw computational power ,
EDU 8: obtaining a system
EDU 9: that can compute viterbi parses for a high-quality grammar at about 000 sentences per second on a mid-range gpu .
EDU 10: in this work , we reintroduce sparsity to gpu parsing
EDU 11: by adapting a coarse-to-fine pruning approach to the constraints of a gpu .
EDU 12: the resulting system is capable of computing over 000 viterbi parses per second more than a 0x speedup on the same hardware .
EDU 13: moreover , our approach allows us to efficiently implement less gpu-friendly minimum bayes risk inference ,
EDU 14: improving throughput for this more accurate algorithm from only 00 sentences per second unpruned to over 000 sentences per second
EDU 15: using pruning nearly a 0x speedup .
EDU 0:
EDU 1: this paper presents the first dependency model for a shift-reduce ccg parser .
EDU 2: modelling dependencies is desirable for a number of reasons ,
EDU 3: including handling the " spurious " ambiguity of ccg ;
EDU 4: fitting well with the theory of ccg ;
EDU 5: and optimizing for structures
EDU 6: which are evaluated at test time .
EDU 7: we develop a novel training technique
EDU 8: using a dependency oracle ,
EDU 9: in which all derivations are hidden .
EDU 10: a challenge arises from the fact
EDU 11: that the oracle needs to keep track of exponentially many gold-standard derivations ,
EDU 12: which is solved
EDU 13: by integrating a packed parse forest with the beam-search decoder .
EDU 14: standard ccgbank tests show
EDU 15: the model achieves up to 0.00 labeled f-score improvements over three existing , competitive ccg parsing models .
EDU 0:
EDU 1: this paper presents the first dependency model for a shift-reduce ccg parser .
EDU 2: modelling dependencies is desirable for a number of reasons ,
EDU 3: including handling the " spurious " ambiguity of ccg ;
EDU 4: fitting well with the theory of ccg ;
EDU 5: and optimizing for structures
EDU 6: which are evaluated at test time .
EDU 7: we develop a novel training technique
EDU 8: using a dependency oracle ,
EDU 9: in which all derivations are hidden .
EDU 10: a challenge arises from the fact
EDU 11: that the oracle needs to keep track of exponentially many gold-standard derivations ,
EDU 12: which is solved
EDU 13: by integrating a packed parse forest with the beam-search decoder .
EDU 14: standard ccgbank tests show
EDU 15: the model achieves up to 0.00 labeled f-score improvements over three existing , competitive ccg parsing models .
EDU 0:
EDU 1: this paper presents the first dependency model for a shift-reduce ccg parser .
EDU 2: modelling dependencies is desirable for a number of reasons ,
EDU 3: including handling the " spurious " ambiguity of ccg ;
EDU 4: fitting well with the theory of ccg ;
EDU 5: and optimizing for structures
EDU 6: which are evaluated at test time .
EDU 7: we develop a novel training technique
EDU 8: using a dependency oracle ,
EDU 9: in which all derivations are hidden .
EDU 10: a challenge arises from the fact
EDU 11: that the oracle needs to keep track of exponentially many gold-standard derivations ,
EDU 12: which is solved
EDU 13: by integrating a packed parse forest with the beam-search decoder .
EDU 14: standard ccgbank tests show
EDU 15: the model achieves up to 0.00 labeled f-score improvements over three existing , competitive ccg parsing models .
EDU 0:
EDU 1: we present a parser
EDU 2: that relies primarily on extracting information directly from surface spans
EDU 3: rather than on propagating information
EDU 4: through enriched grammar structure .
EDU 5: for example , instead of creating separate grammar symbols
EDU 6: to mark the definiteness of an np ,
EDU 7: our parser might instead capture the same information from the first word of the np .
EDU 8: moving context out of the grammar and onto surface features can greatly simplify the structural component of the parser :
EDU 9: because so many deep syntactic cues have surface reflexes ,
EDU 10: our system can still parse accurately with context-free backbones as minimal as x-bar grammars .
EDU 11: keeping the structural backbone simple and moving features to the surface also allows easy adaptation to new languages and even to new tasks .
EDU 12: on the spmrl 0000 multilingual constituency parsing shared task ( seddah et al. , 0000 ) , our system outperforms the top single parser system of björkelund et al. ( 0000 ) on a range of languages .
EDU 13: in addition, despite being designed for syntactic analysis ,
EDU 14: our system also achieves state-of-the-art numbers on the structural sentiment task of socher et al. ( 0000 ) .
EDU 15: finally , we show
EDU 16: that , in both syntactic parsing and sentiment analysis , many broad linguistic trends can be captured
EDU 17: via surface features .
EDU 0:
EDU 1: we present a parser
EDU 2: that relies primarily on extracting information directly from surface spans
EDU 3: rather than on propagating information
EDU 4: through enriched grammar structure .
EDU 5: for example , instead of creating separate grammar symbols
EDU 6: to mark the definiteness of an np ,
EDU 7: our parser might instead capture the same information from the first word of the np .
EDU 8: moving context out of the grammar and onto surface features can greatly simplify the structural component of the parser :
EDU 9: because so many deep syntactic cues have surface reflexes ,
EDU 10: our system can still parse accurately with context-free backbones as minimal as x-bar grammars .
EDU 11: keeping the structural backbone simple and moving features to the surface also allows easy adaptation to new languages and even to new tasks .
EDU 12: on the spmrl 0000 multilingual constituency parsing shared task ( seddah et al. , 0000 ) , our system outperforms the top single parser system of björkelund et al. ( 0000 ) on a range of languages .
EDU 13: in addition, despite being designed for syntactic analysis ,
EDU 14: our system also achieves state-of-the-art numbers on the structural sentiment task of socher et al. ( 0000 ) .
EDU 15: finally , we show
EDU 16: that , in both syntactic parsing and sentiment analysis , many broad linguistic trends can be captured
EDU 17: via surface features .
EDU 0:
EDU 1: we present a parser
EDU 2: that relies primarily on extracting information directly from surface spans
EDU 3: rather than on propagating information
EDU 4: through enriched grammar structure .
EDU 5: for example , instead of creating separate grammar symbols
EDU 6: to mark the definiteness of an np ,
EDU 7: our parser might instead capture the same information from the first word of the np .
EDU 8: moving context out of the grammar and onto surface features can greatly simplify the structural component of the parser :
EDU 9: because so many deep syntactic cues have surface reflexes ,
EDU 10: our system can still parse accurately with context-free backbones as minimal as x-bar grammars .
EDU 11: keeping the structural backbone simple and moving features to the surface also allows easy adaptation to new languages and even to new tasks .
EDU 12: on the spmrl 0000 multilingual constituency parsing shared task ( seddah et al. , 0000 ) , our system outperforms the top single parser system of björkelund et al. ( 0000 ) on a range of languages .
EDU 13: in addition, despite being designed for syntactic analysis ,
EDU 14: our system also achieves state-of-the-art numbers on the structural sentiment task of socher et al. ( 0000 ) .
EDU 15: finally , we show
EDU 16: that , in both syntactic parsing and sentiment analysis , many broad linguistic trends can be captured
EDU 17: via surface features .
EDU 0:
EDU 1: context-predicting models
EDU 2: ( more commonly known as embeddings or neural language models )
EDU 3: are the new kids on the distributional semantics block .
EDU 4: despite the buzz
EDU 5: surrounding these models ,
EDU 6: the literature is still lacking a systematic comparison of the predictive models with classic , count-vector-based distributional semantic approaches .
EDU 7: in this paper , we perform such an extensive evaluation , on a wide range of lexical semantics tasks and across many parameter settings .
EDU 8: the results , to our own surprise , show
EDU 9: that the buzz is fully justified ,
EDU 10: as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts .
EDU 0:
EDU 1: context-predicting models
EDU 2: ( more commonly known as embeddings or neural language models )
EDU 3: are the new kids on the distributional semantics block .
EDU 4: despite the buzz
EDU 5: surrounding these models ,
EDU 6: the literature is still lacking a systematic comparison of the predictive models with classic , count-vector-based distributional semantic approaches .
EDU 7: in this paper , we perform such an extensive evaluation , on a wide range of lexical semantics tasks and across many parameter settings .
EDU 8: the results , to our own surprise , show
EDU 9: that the buzz is fully justified ,
EDU 10: as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts .
EDU 0:
EDU 1: context-predicting models
EDU 2: ( more commonly known as embeddings or neural language models )
EDU 3: are the new kids on the distributional semantics block .
EDU 4: despite the buzz
EDU 5: surrounding these models ,
EDU 6: the literature is still lacking a systematic comparison of the predictive models with classic , count-vector-based distributional semantic approaches .
EDU 7: in this paper , we perform such an extensive evaluation , on a wide range of lexical semantics tasks and across many parameter settings .
EDU 8: the results , to our own surprise , show
EDU 9: that the buzz is fully justified ,
EDU 10: as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts .
EDU 0:
EDU 1: we show
EDU 2: that it is possible to reliably discriminate whether a syntactic construction is meant literally or metaphorically
EDU 3: using lexical semantic features of the words
EDU 4: that participate in the construction .
EDU 5: our model is constructed
EDU 6: using english resources ,
EDU 7: and we obtain state-of-the-art performance relative to previous work in this language .
EDU 8: using a model transfer approach by pivoting through a bilingual dictionary ,
EDU 9: we show
EDU 10: our model can identify metaphoric expressions in other languages .
EDU 11: we provide results on three new test sets in spanish , farsi , and russian .
EDU 12: the results support the hypothesis
EDU 13: that metaphors are conceptual , rather than lexical , in nature .
EDU 0:
EDU 1: we show
EDU 2: that it is possible to reliably discriminate whether a syntactic construction is meant literally or metaphorically
EDU 3: using lexical semantic features of the words
EDU 4: that participate in the construction .
EDU 5: our model is constructed
EDU 6: using english resources ,
EDU 7: and we obtain state-of-the-art performance relative to previous work in this language .
EDU 8: using a model transfer approach by pivoting through a bilingual dictionary ,
EDU 9: we show
EDU 10: our model can identify metaphoric expressions in other languages .
EDU 11: we provide results on three new test sets in spanish , farsi , and russian .
EDU 12: the results support the hypothesis
EDU 13: that metaphors are conceptual , rather than lexical , in nature .
EDU 0:
EDU 1: we show
EDU 2: that it is possible to reliably discriminate whether a syntactic construction is meant literally or metaphorically
EDU 3: using lexical semantic features of the words
EDU 4: that participate in the construction .
EDU 5: our model is constructed
EDU 6: using english resources ,
EDU 7: and we obtain state-of-the-art performance relative to previous work in this language .
EDU 8: using a model transfer approach by pivoting through a bilingual dictionary ,
EDU 9: we show
EDU 10: our model can identify metaphoric expressions in other languages .
EDU 11: we provide results on three new test sets in spanish , farsi , and russian .
EDU 12: the results support the hypothesis
EDU 13: that metaphors are conceptual , rather than lexical , in nature .
EDU 0:
EDU 1: unsupervised word sense disambiguation ( wsd ) methods are an attractive approach to all-words wsd
EDU 2: due to their non-reliance on expensive annotated data .
EDU 3: unsupervised estimates of sense frequency have been shown to be very useful for wsd
EDU 4: due to the skewed nature of word sense distributions .
EDU 5: this paper presents a fully unsupervised topic modelling-based approach to sense frequency estimation ,
EDU 6: which is highly portable to different corpora and sense inventories ,
EDU 7: in being applicable to any part of speech ,
EDU 8: and not requiring a hierarchical sense inventory , parsing or parallel text .
EDU 9: we demonstrate the effectiveness of the method over the tasks of predominant sense learning and sense distribution acquisition , and also the novel tasks
EDU 10: of detecting senses
EDU 11: which aren't attested in the corpus ,
EDU 12: and identifying novel senses in the corpus
EDU 13: which aren't captured in the sense inventory .
EDU 0:
EDU 1: unsupervised word sense disambiguation ( wsd ) methods are an attractive approach to all-words wsd
EDU 2: due to their non-reliance on expensive annotated data .
EDU 3: unsupervised estimates of sense frequency have been shown to be very useful for wsd
EDU 4: due to the skewed nature of word sense distributions .
EDU 5: this paper presents a fully unsupervised topic modelling-based approach to sense frequency estimation ,
EDU 6: which is highly portable to different corpora and sense inventories ,
EDU 7: in being applicable to any part of speech ,
EDU 8: and not requiring a hierarchical sense inventory , parsing or parallel text .
EDU 9: we demonstrate the effectiveness of the method over the tasks of predominant sense learning and sense distribution acquisition , and also the novel tasks
EDU 10: of detecting senses
EDU 11: which aren't attested in the corpus ,
EDU 12: and identifying novel senses in the corpus
EDU 13: which aren't captured in the sense inventory .
EDU 0:
EDU 1: unsupervised word sense disambiguation ( wsd ) methods are an attractive approach to all-words wsd
EDU 2: due to their non-reliance on expensive annotated data .
EDU 3: unsupervised estimates of sense frequency have been shown to be very useful for wsd
EDU 4: due to the skewed nature of word sense distributions .
EDU 5: this paper presents a fully unsupervised topic modelling-based approach to sense frequency estimation ,
EDU 6: which is highly portable to different corpora and sense inventories ,
EDU 7: in being applicable to any part of speech ,
EDU 8: and not requiring a hierarchical sense inventory , parsing or parallel text .
EDU 9: we demonstrate the effectiveness of the method over the tasks of predominant sense learning and sense distribution acquisition , and also the novel tasks
EDU 10: of detecting senses
EDU 11: which aren't attested in the corpus ,
EDU 12: and identifying novel senses in the corpus
EDU 13: which aren't captured in the sense inventory .
EDU 0:
EDU 1: we present an approach
EDU 2: for automatically learning to solve algebra word problems .
EDU 3: our algorithm reasons across sentence boundaries
EDU 4: to construct and solve a system of linear equations ,
EDU 5: while simultaneously recovering an alignment of the variables and numbers in these equations to the problem text .
EDU 6: the learning algorithm uses varied supervision ,
EDU 7: including either full equations or just the final answers .
EDU 8: we evaluate performance on a newly gathered corpus of algebra word problems ,
EDU 9: demonstrating
EDU 10: that the system can correctly answer almost 00 % of the questions in the dataset .
EDU 11: this is , to our knowledge , the first learning result for this task .
EDU 0:
EDU 1: we present an approach
EDU 2: for automatically learning to solve algebra word problems .
EDU 3: our algorithm reasons across sentence boundaries
EDU 4: to construct and solve a system of linear equations ,
EDU 5: while simultaneously recovering an alignment of the variables and numbers in these equations to the problem text .
EDU 6: the learning algorithm uses varied supervision ,
EDU 7: including either full equations or just the final answers .
EDU 8: we evaluate performance on a newly gathered corpus of algebra word problems ,
EDU 9: demonstrating
EDU 10: that the system can correctly answer almost 00 % of the questions in the dataset .
EDU 11: this is , to our knowledge , the first learning result for this task .
EDU 0:
EDU 1: we present an approach
EDU 2: for automatically learning to solve algebra word problems .
EDU 3: our algorithm reasons across sentence boundaries
EDU 4: to construct and solve a system of linear equations ,
EDU 5: while simultaneously recovering an alignment of the variables and numbers in these equations to the problem text .
EDU 6: the learning algorithm uses varied supervision ,
EDU 7: including either full equations or just the final answers .
EDU 8: we evaluate performance on a newly gathered corpus of algebra word problems ,
EDU 9: demonstrating
EDU 10: that the system can correctly answer almost 00 % of the questions in the dataset .
EDU 11: this is , to our knowledge , the first learning result for this task .
EDU 0:
EDU 1: inspired by experimental psychological findings
EDU 2: suggesting
EDU 3: that function words play a special role in word learning ,
EDU 4: we make a simple modification to an adaptor grammar based bayesian word segmentation model
EDU 5: to allow it to learn sequences of monosyllabic "function words" at the beginnings and endings of collocations of ( possibly multi-syllabic ) words .
EDU 6: this modification improves unsupervised word segmentation on the standard bernstein-ratner ( 0000 ) corpus of child-directed english by more than 0 % token f-score
EDU 7: compared to a model identical
EDU 8: except that it does not special-case "function words" ,
EDU 9: setting a new state-of-the-art of 00.0 % token f-score .
EDU 10: our function word model assumes
EDU 11: that function words appear at the left periphery ,
EDU 12: and while this is true of languages such as english ,
EDU 13: it is not true universally .
EDU 14: we show
EDU 15: that a learner can use bayesian model selection
EDU 16: to determine the location of function words in their language ,
EDU 17: even though the input to the model only consists of unsegmented sequences of phones .
EDU 18: thus our computational models support the hypothesis
EDU 19: that function words play a special role in word learning .
EDU 0:
EDU 1: inspired by experimental psychological findings
EDU 2: suggesting
EDU 3: that function words play a special role in word learning ,
EDU 4: we make a simple modification to an adaptor grammar based bayesian word segmentation model
EDU 5: to allow it to learn sequences of monosyllabic "function words" at the beginnings and endings of collocations of ( possibly multi-syllabic ) words .
EDU 6: this modification improves unsupervised word segmentation on the standard bernstein-ratner ( 0000 ) corpus of child-directed english by more than 0 % token f-score
EDU 7: compared to a model identical
EDU 8: except that it does not special-case "function words" ,
EDU 9: setting a new state-of-the-art of 00.0 % token f-score .
EDU 10: our function word model assumes
EDU 11: that function words appear at the left periphery ,
EDU 12: and while this is true of languages such as english ,
EDU 13: it is not true universally .
EDU 14: we show
EDU 15: that a learner can use bayesian model selection
EDU 16: to determine the location of function words in their language ,
EDU 17: even though the input to the model only consists of unsegmented sequences of phones .
EDU 18: thus our computational models support the hypothesis
EDU 19: that function words play a special role in word learning .
EDU 0:
EDU 1: inspired by experimental psychological findings
EDU 2: suggesting
EDU 3: that function words play a special role in word learning ,
EDU 4: we make a simple modification to an adaptor grammar based bayesian word segmentation model
EDU 5: to allow it to learn sequences of monosyllabic "function words" at the beginnings and endings of collocations of ( possibly multi-syllabic ) words .
EDU 6: this modification improves unsupervised word segmentation on the standard bernstein-ratner ( 0000 ) corpus of child-directed english by more than 0 % token f-score
EDU 7: compared to a model identical
EDU 8: except that it does not special-case "function words" ,
EDU 9: setting a new state-of-the-art of 00.0 % token f-score .
EDU 10: our function word model assumes
EDU 11: that function words appear at the left periphery ,
EDU 12: and while this is true of languages such as english ,
EDU 13: it is not true universally .
EDU 14: we show
EDU 15: that a learner can use bayesian model selection
EDU 16: to determine the location of function words in their language ,
EDU 17: even though the input to the model only consists of unsegmented sequences of phones .
EDU 18: thus our computational models support the hypothesis
EDU 19: that function words play a special role in word learning .
EDU 0:
EDU 1: recently , neural network models for natural language processing tasks have been increasingly focused on for their ability
EDU 2: to alleviate the burden of manual feature engineering .
EDU 3: in this paper , we propose a novel neural network model for chinese word segmentation
EDU 4: called max-margin tensor neural network ( mmtnn ) .
EDU 5: by exploiting tag embeddings and tensor-based transformation ,
EDU 6: mmtnn has the ability
EDU 7: to model complicated interactions between tags and context characters .
EDU 8: furthermore , a new tensor factorization approach is proposed
EDU 9: to speed up the model
EDU 10: and avoid overfitting .
EDU 11: experiments on the benchmark dataset show
EDU 12: that our model achieves better performances than previous neural network models
EDU 13: and that our model can achieve a competitive performance with minimal feature engineering .
EDU 14: despite chinese word segmentation being a specific case ,
EDU 15: mmtnn can be easily generalized and applied to other sequence labeling tasks .
EDU 0:
EDU 1: recently , neural network models for natural language processing tasks have been increasingly focused on for their ability
EDU 2: to alleviate the burden of manual feature engineering .
EDU 3: in this paper , we propose a novel neural network model for chinese word segmentation
EDU 4: called max-margin tensor neural network ( mmtnn ) .
EDU 5: by exploiting tag embeddings and tensor-based transformation ,
EDU 6: mmtnn has the ability
EDU 7: to model complicated interactions between tags and context characters .
EDU 8: furthermore , a new tensor factorization approach is proposed
EDU 9: to speed up the model
EDU 10: and avoid overfitting .
EDU 11: experiments on the benchmark dataset show
EDU 12: that our model achieves better performances than previous neural network models
EDU 13: and that our model can achieve a competitive performance with minimal feature engineering .
EDU 14: despite chinese word segmentation being a specific case ,
EDU 15: mmtnn can be easily generalized and applied to other sequence labeling tasks .
EDU 0:
EDU 1: recently , neural network models for natural language processing tasks have been increasingly focused on for their ability
EDU 2: to alleviate the burden of manual feature engineering .
EDU 3: in this paper , we propose a novel neural network model for chinese word segmentation
EDU 4: called max-margin tensor neural network ( mmtnn ) .
EDU 5: by exploiting tag embeddings and tensor-based transformation ,
EDU 6: mmtnn has the ability
EDU 7: to model complicated interactions between tags and context characters .
EDU 8: furthermore , a new tensor factorization approach is proposed
EDU 9: to speed up the model
EDU 10: and avoid overfitting .
EDU 11: experiments on the benchmark dataset show
EDU 12: that our model achieves better performances than previous neural network models
EDU 13: and that our model can achieve a competitive performance with minimal feature engineering .
EDU 14: despite chinese word segmentation being a specific case ,
EDU 15: mmtnn can be easily generalized and applied to other sequence labeling tasks .
EDU 0:
EDU 1: negation words , such as no and not , play a fundamental role
EDU 2: in modifying sentiment of textual expressions .
EDU 3: we will refer to a negation word as the negator and the text span within the scope of the negator as the argument .
EDU 4: commonly used heuristics
EDU 5: to estimate the sentiment of negated expressions
EDU 6: rely simply on the sentiment of argument
EDU 7: ( and not on the negator or the argument itself ) .
EDU 8: we use a sentiment treebank
EDU 9: to show
EDU 10: that these existing heuristics are poor estimators of sentiment .
EDU 11: we then modify these heuristics to be dependent on the negators
EDU 12: and show
EDU 13: that this improves prediction .
EDU 14: next , we evaluate a recently proposed composition model ( socher et al. , 0000 )
EDU 15: that relies on both the negator and the argument .
EDU 16: this model learns the syntax and semantics of the negator's argument with a recursive neural network .
EDU 17: we show
EDU 18: that this approach performs better than those
EDU 19: mentioned above .
EDU 20: in addition , we explicitly incorporate the prior sentiment of the argument
EDU 21: and observe
EDU 22: that this information can help reduce fitting errors .
EDU 0:
EDU 1: negation words , such as no and not , play a fundamental role
EDU 2: in modifying sentiment of textual expressions .
EDU 3: we will refer to a negation word as the negator and the text span within the scope of the negator as the argument .
EDU 4: commonly used heuristics
EDU 5: to estimate the sentiment of negated expressions
EDU 6: rely simply on the sentiment of argument
EDU 7: ( and not on the negator or the argument itself ) .
EDU 8: we use a sentiment treebank
EDU 9: to show
EDU 10: that these existing heuristics are poor estimators of sentiment .
EDU 11: we then modify these heuristics to be dependent on the negators
EDU 12: and show
EDU 13: that this improves prediction .
EDU 14: next , we evaluate a recently proposed composition model ( socher et al. , 0000 )
EDU 15: that relies on both the negator and the argument .
EDU 16: this model learns the syntax and semantics of the negator's argument with a recursive neural network .
EDU 17: we show
EDU 18: that this approach performs better than those
EDU 19: mentioned above .
EDU 20: in addition , we explicitly incorporate the prior sentiment of the argument
EDU 21: and observe
EDU 22: that this information can help reduce fitting errors .
EDU 0:
EDU 1: negation words , such as no and not , play a fundamental role
EDU 2: in modifying sentiment of textual expressions .
EDU 3: we will refer to a negation word as the negator and the text span within the scope of the negator as the argument .
EDU 4: commonly used heuristics
EDU 5: to estimate the sentiment of negated expressions
EDU 6: rely simply on the sentiment of argument
EDU 7: ( and not on the negator or the argument itself ) .
EDU 8: we use a sentiment treebank
EDU 9: to show
EDU 10: that these existing heuristics are poor estimators of sentiment .
EDU 11: we then modify these heuristics to be dependent on the negators
EDU 12: and show
EDU 13: that this improves prediction .
EDU 14: next , we evaluate a recently proposed composition model ( socher et al. , 0000 )
EDU 15: that relies on both the negator and the argument .
EDU 16: this model learns the syntax and semantics of the negator's argument with a recursive neural network .
EDU 17: we show
EDU 18: that this approach performs better than those
EDU 19: mentioned above .
EDU 20: in addition , we explicitly incorporate the prior sentiment of the argument
EDU 21: and observe
EDU 22: that this information can help reduce fitting errors .
EDU 0:
EDU 1: extracting opinion targets and opinion words from online reviews are two fundamental tasks in opinion mining .
EDU 2: this paper proposes a novel approach
EDU 3: to collectively extract them with graph co-ranking .
EDU 4: first , compared to previous methods
EDU 5: which solely employed opinion relations among words ,
EDU 6: our method constructs a heterogeneous graph
EDU 7: to model two types of relations ,
EDU 8: including semantic relations and opinion relations .
EDU 9: next , a co-ranking algorithm is proposed
EDU 10: to estimate the confidence of each candidate ,
EDU 11: and the candidates with higher confidence will be extracted as opinion targets/words .
EDU 12: in this way , different relations make cooperative effects on candidate's confidence estimation .
EDU 13: moreover , word preference is captured and incorporated into our co-ranking algorithm .
EDU 14: in this way , our co-ranking is personalized
EDU 15: and each candidate's confidence is only determined by its preferred collocations .
EDU 16: it helps to improve the extraction precision .
EDU 17: the experimental results on three data sets with different sizes and languages show
EDU 18: that our approach achieves better performance than state-of-the-art methods .
EDU 0:
EDU 1: extracting opinion targets and opinion words from online reviews are two fundamental tasks in opinion mining .
EDU 2: this paper proposes a novel approach
EDU 3: to collectively extract them with graph co-ranking .
EDU 4: first , compared to previous methods
EDU 5: which solely employed opinion relations among words ,
EDU 6: our method constructs a heterogeneous graph
EDU 7: to model two types of relations ,
EDU 8: including semantic relations and opinion relations .
EDU 9: next , a co-ranking algorithm is proposed
EDU 10: to estimate the confidence of each candidate ,
EDU 11: and the candidates with higher confidence will be extracted as opinion targets/words .
EDU 12: in this way , different relations make cooperative effects on candidate's confidence estimation .
EDU 13: moreover , word preference is captured and incorporated into our co-ranking algorithm .
EDU 14: in this way , our co-ranking is personalized
EDU 15: and each candidate's confidence is only determined by its preferred collocations .
EDU 16: it helps to improve the extraction precision .
EDU 17: the experimental results on three data sets with different sizes and languages show
EDU 18: that our approach achieves better performance than state-of-the-art methods .
EDU 0:
EDU 1: extracting opinion targets and opinion words from online reviews are two fundamental tasks in opinion mining .
EDU 2: this paper proposes a novel approach
EDU 3: to collectively extract them with graph co-ranking .
EDU 4: first , compared to previous methods
EDU 5: which solely employed opinion relations among words ,
EDU 6: our method constructs a heterogeneous graph
EDU 7: to model two types of relations ,
EDU 8: including semantic relations and opinion relations .
EDU 9: next , a co-ranking algorithm is proposed
EDU 10: to estimate the confidence of each candidate ,
EDU 11: and the candidates with higher confidence will be extracted as opinion targets/words .
EDU 12: in this way , different relations make cooperative effects on candidate's confidence estimation .
EDU 13: moreover , word preference is captured and incorporated into our co-ranking algorithm .
EDU 14: in this way , our co-ranking is personalized
EDU 15: and each candidate's confidence is only determined by its preferred collocations .
EDU 16: it helps to improve the extraction precision .
EDU 17: the experimental results on three data sets with different sizes and languages show
EDU 18: that our approach achieves better performance than state-of-the-art methods .
EDU 0:
EDU 1: this paper proposes a novel context-aware method
EDU 2: for analyzing sentiment at the level of individual sentences .
EDU 3: most existing machine learning approaches suffer from limitations in the modeling of complex linguistic structures across sentences
EDU 4: and often fail to capture non-local contextual cues
EDU 5: that are important for sentiment interpretation .
EDU 6: in contrast , our approach allows structured modeling of sentiment
EDU 7: while taking into account both local and global contextual information .
EDU 8: specifically , we encode intuitive lexical and discourse knowledge as expressive constraints
EDU 9: and integrate them into the learning of conditional random field models
EDU 10: via posterior regularization .
EDU 11: the context-aware constraints provide additional power to the crf model
EDU 12: and can guide semi-supervised learning
EDU 13: when labeled data is limited .
EDU 14: experiments on standard product review datasets show
EDU 15: that our method outperforms the state-of-the-art methods in both the supervised and semi-supervised settings .
EDU 0:
EDU 1: this paper proposes a novel context-aware method
EDU 2: for analyzing sentiment at the level of individual sentences .
EDU 3: most existing machine learning approaches suffer from limitations in the modeling of complex linguistic structures across sentences
EDU 4: and often fail to capture non-local contextual cues
EDU 5: that are important for sentiment interpretation .
EDU 6: in contrast , our approach allows structured modeling of sentiment
EDU 7: while taking into account both local and global contextual information .
EDU 8: specifically , we encode intuitive lexical and discourse knowledge as expressive constraints
EDU 9: and integrate them into the learning of conditional random field models
EDU 10: via posterior regularization .
EDU 11: the context-aware constraints provide additional power to the crf model
EDU 12: and can guide semi-supervised learning
EDU 13: when labeled data is limited .
EDU 14: experiments on standard product review datasets show
EDU 15: that our method outperforms the state-of-the-art methods in both the supervised and semi-supervised settings .
EDU 0:
EDU 1: this paper proposes a novel context-aware method
EDU 2: for analyzing sentiment at the level of individual sentences .
EDU 3: most existing machine learning approaches suffer from limitations in the modeling of complex linguistic structures across sentences
EDU 4: and often fail to capture non-local contextual cues
EDU 5: that are important for sentiment interpretation .
EDU 6: in contrast , our approach allows structured modeling of sentiment
EDU 7: while taking into account both local and global contextual information .
EDU 8: specifically , we encode intuitive lexical and discourse knowledge as expressive constraints
EDU 9: and integrate them into the learning of conditional random field models
EDU 10: via posterior regularization .
EDU 11: the context-aware constraints provide additional power to the crf model
EDU 12: and can guide semi-supervised learning
EDU 13: when labeled data is limited .
EDU 14: experiments on standard product review datasets show
EDU 15: that our method outperforms the state-of-the-art methods in both the supervised and semi-supervised settings .
EDU 0:
EDU 1: product feature mining is a key subtask in fine-grained opinion mining .
EDU 2: previous works often use syntax constituents in this task .
EDU 3: however , syntax-based methods can only use discrete contextual information ,
EDU 4: which may suffer from data sparsity .
EDU 5: this paper proposes a novel product feature mining method
EDU 6: which leverages lexical and contextual semantic clues .
EDU 7: lexical semantic clue verifies whether a candidate term is related to the target product ,
EDU 8: and contextual semantic clue serves as a soft pattern miner
EDU 9: to find candidates ,
EDU 10: which exploits semantics of each word in context
EDU 11: so as to alleviate the data sparsity problem .
EDU 12: we build a semantic similarity graph
EDU 13: to encode lexical semantic clue ,
EDU 14: and employ a convolutional neural model
EDU 15: to capture contextual semantic clue .
EDU 16: then label propagation is applied to combine both semantic clues .
EDU 17: experimental results show
EDU 18: that our semantics-based method significantly outperforms conventional syntax-based approaches ,
EDU 19: which not only mines product features more accurately ,
EDU 20: but also extracts more infrequent product features .
EDU 0:
EDU 1: product feature mining is a key subtask in fine-grained opinion mining .
EDU 2: previous works often use syntax constituents in this task .
EDU 3: however , syntax-based methods can only use discrete contextual information ,
EDU 4: which may suffer from data sparsity .
EDU 5: this paper proposes a novel product feature mining method
EDU 6: which leverages lexical and contextual semantic clues .
EDU 7: lexical semantic clue verifies whether a candidate term is related to the target product ,
EDU 8: and contextual semantic clue serves as a soft pattern miner
EDU 9: to find candidates ,
EDU 10: which exploits semantics of each word in context
EDU 11: so as to alleviate the data sparsity problem .
EDU 12: we build a semantic similarity graph
EDU 13: to encode lexical semantic clue ,
EDU 14: and employ a convolutional neural model
EDU 15: to capture contextual semantic clue .
EDU 16: then label propagation is applied to combine both semantic clues .
EDU 17: experimental results show
EDU 18: that our semantics-based method significantly outperforms conventional syntax-based approaches ,
EDU 19: which not only mines product features more accurately ,
EDU 20: but also extracts more infrequent product features .
EDU 0:
EDU 1: product feature mining is a key subtask in fine-grained opinion mining .
EDU 2: previous works often use syntax constituents in this task .
EDU 3: however , syntax-based methods can only use discrete contextual information ,
EDU 4: which may suffer from data sparsity .
EDU 5: this paper proposes a novel product feature mining method
EDU 6: which leverages lexical and contextual semantic clues .
EDU 7: lexical semantic clue verifies whether a candidate term is related to the target product ,
EDU 8: and contextual semantic clue serves as a soft pattern miner
EDU 9: to find candidates ,
EDU 10: which exploits semantics of each word in context
EDU 11: so as to alleviate the data sparsity problem .
EDU 12: we build a semantic similarity graph
EDU 13: to encode lexical semantic clue ,
EDU 14: and employ a convolutional neural model
EDU 15: to capture contextual semantic clue .
EDU 16: then label propagation is applied to combine both semantic clues .
EDU 17: experimental results show
EDU 18: that our semantics-based method significantly outperforms conventional syntax-based approaches ,
EDU 19: which not only mines product features more accurately ,
EDU 20: but also extracts more infrequent product features .
EDU 0:
EDU 1: aspect extraction is an important task in sentiment analysis .
EDU 2: topic modeling is a popular method for the task .
EDU 3: however , unsupervised topic models often generate incoherent aspects .
EDU 4: to address the issue ,
EDU 5: several knowledge-based models have been proposed
EDU 6: to incorporate prior knowledge
EDU 7: provided by the user
EDU 8: to guide modeling .
EDU 9: in this paper , we take a major step forward
EDU 10: and show
EDU 11: that in the big data era , without any user input , it is possible to learn prior knowledge automatically from a large amount of review data available on the web .
EDU 12: such knowledge can then be used by a topic model
EDU 13: to discover more coherent aspects .
EDU 14: there are two key challenges
EDU 15: : ( 0 ) learning quality knowledge from reviews of diverse domains ,
EDU 16: and ( 0 ) making the model fault-tolerant to handle possibly wrong knowledge .
EDU 17: a novel approach is proposed
EDU 18: to solve these problems .
EDU 19: experimental results
EDU 20: using reviews from 00 domains
EDU 21: show
EDU 22: that the proposed approach achieves significant improvements over state-of-the-art baselines .
EDU 0:
EDU 1: aspect extraction is an important task in sentiment analysis .
EDU 2: topic modeling is a popular method for the task .
EDU 3: however , unsupervised topic models often generate incoherent aspects .
EDU 4: to address the issue ,
EDU 5: several knowledge-based models have been proposed
EDU 6: to incorporate prior knowledge
EDU 7: provided by the user
EDU 8: to guide modeling .
EDU 9: in this paper , we take a major step forward
EDU 10: and show
EDU 11: that in the big data era , without any user input , it is possible to learn prior knowledge automatically from a large amount of review data available on the web .
EDU 12: such knowledge can then be used by a topic model
EDU 13: to discover more coherent aspects .
EDU 14: there are two key challenges
EDU 15: : ( 0 ) learning quality knowledge from reviews of diverse domains ,
EDU 16: and ( 0 ) making the model fault-tolerant to handle possibly wrong knowledge .
EDU 17: a novel approach is proposed
EDU 18: to solve these problems .
EDU 19: experimental results
EDU 20: using reviews from 00 domains
EDU 21: show
EDU 22: that the proposed approach achieves significant improvements over state-of-the-art baselines .
EDU 0:
EDU 1: aspect extraction is an important task in sentiment analysis .
EDU 2: topic modeling is a popular method for the task .
EDU 3: however , unsupervised topic models often generate incoherent aspects .
EDU 4: to address the issue ,
EDU 5: several knowledge-based models have been proposed
EDU 6: to incorporate prior knowledge
EDU 7: provided by the user
EDU 8: to guide modeling .
EDU 9: in this paper , we take a major step forward
EDU 10: and show
EDU 11: that in the big data era , without any user input , it is possible to learn prior knowledge automatically from a large amount of review data available on the web .
EDU 12: such knowledge can then be used by a topic model
EDU 13: to discover more coherent aspects .
EDU 14: there are two key challenges
EDU 15: : ( 0 ) learning quality knowledge from reviews of diverse domains ,
EDU 16: and ( 0 ) making the model fault-tolerant to handle possibly wrong knowledge .
EDU 17: a novel approach is proposed
EDU 18: to solve these problems .
EDU 19: experimental results
EDU 20: using reviews from 00 domains
EDU 21: show
EDU 22: that the proposed approach achieves significant improvements over state-of-the-art baselines .
EDU 0:
EDU 1: spectral methods offer scalable alternatives to markov chain monte carlo and expectation maximization .
EDU 2: however , these new methods lack the rich priors
EDU 3: associated with probabilistic models .
EDU 4: we examine arora et al.'s anchor words algorithm for topic modeling
EDU 5: and develop new , regularized algorithms
EDU 6: that not only mathematically resemble gaussian and dirichlet priors
EDU 7: but also improve the interpretability of topic models .
EDU 8: our new regularization approaches make these efficient algorithms more flexible ;
EDU 9: we also show
EDU 10: that these methods can be combined with informed priors .
EDU 0:
EDU 1: spectral methods offer scalable alternatives to markov chain monte carlo and expectation maximization .
EDU 2: however , these new methods lack the rich priors
EDU 3: associated with probabilistic models .
EDU 4: we examine arora et al.'s anchor words algorithm for topic modeling
EDU 5: and develop new , regularized algorithms
EDU 6: that not only mathematically resemble gaussian and dirichlet priors
EDU 7: but also improve the interpretability of topic models .
EDU 8: our new regularization approaches make these efficient algorithms more flexible ;
EDU 9: we also show
EDU 10: that these methods can be combined with informed priors .
EDU 0:
EDU 1: spectral methods offer scalable alternatives to markov chain monte carlo and expectation maximization .
EDU 2: however , these new methods lack the rich priors
EDU 3: associated with probabilistic models .
EDU 4: we examine arora et al.'s anchor words algorithm for topic modeling
EDU 5: and develop new , regularized algorithms
EDU 6: that not only mathematically resemble gaussian and dirichlet priors
EDU 7: but also improve the interpretability of topic models .
EDU 8: our new regularization approaches make these efficient algorithms more flexible ;
EDU 9: we also show
EDU 10: that these methods can be combined with informed priors .
EDU 0:
EDU 1: we consider the problem
EDU 2: of automatically inferring latent character types in a collection of 00,000 english novels
EDU 3: published between 0000 and 0000 .
EDU 4: unlike prior work
EDU 5: in which character types are assumed responsible for probabilistically generating all text
EDU 6: associated with a character ,
EDU 7: we introduce a model
EDU 8: that employs multiple effects
EDU 9: to account for the influence of extra-linguistic information
EDU 10: ( such as author ) .
EDU 11: in an empirical evaluation , we find
EDU 12: that this method leads to improved agreement with the preregistered judgments of a literary scholar ,
EDU 13: complementing the results of alternative models .
EDU 0:
EDU 1: we consider the problem
EDU 2: of automatically inferring latent character types in a collection of 00,000 english novels
EDU 3: published between 0000 and 0000 .
EDU 4: unlike prior work
EDU 5: in which character types are assumed responsible for probabilistically generating all text
EDU 6: associated with a character ,
EDU 7: we introduce a model
EDU 8: that employs multiple effects
EDU 9: to account for the influence of extra-linguistic information
EDU 10: ( such as author ) .
EDU 11: in an empirical evaluation , we find
EDU 12: that this method leads to improved agreement with the preregistered judgments of a literary scholar ,
EDU 13: complementing the results of alternative models .
EDU 0:
EDU 1: we consider the problem
EDU 2: of automatically inferring latent character types in a collection of 00,000 english novels
EDU 3: published between 0000 and 0000 .
EDU 4: unlike prior work
EDU 5: in which character types are assumed responsible for probabilistically generating all text
EDU 6: associated with a character ,
EDU 7: we introduce a model
EDU 8: that employs multiple effects
EDU 9: to account for the influence of extra-linguistic information
EDU 10: ( such as author ) .
EDU 11: in an empirical evaluation , we find
EDU 12: that this method leads to improved agreement with the preregistered judgments of a literary scholar ,
EDU 13: complementing the results of alternative models .
EDU 0:
EDU 1: wikification for tweets aims to automatically identify each concept mention in a tweet
EDU 2: and link it to a concept referent in a knowledge base
EDU 3: ( e.g. , wikipedia ) .
EDU 4: due to the shortness of a tweet ,
EDU 5: a collective inference model
EDU 6: incorporating global evidence from multiple mentions and concepts
EDU 7: is more appropriate than a noncollecitve approach
EDU 8: which links each mention at a time .
EDU 9: in addition , it is challenging to generate sufficient high quality labeled data for supervised models with low cost .
EDU 10: to tackle these challenges ,
EDU 11: we propose a novel semi-supervised graph regularization model
EDU 12: to incorporate both local and global evidence from multiple tweets through three fine-grained relations .
EDU 13: in order to identify semantically-related mentions for collective inference ,
EDU 14: we detect meta path-based semantic relations through social networks .
EDU 15: compared to the state-of-the-art supervised model
EDU 16: trained from 000 % labeled data ,
EDU 17: our proposed approach achieves comparable performance with 00 % labeled data and obtains 0 % absolute f0 gain with 00 % labeled data .
EDU 0:
EDU 1: wikification for tweets aims to automatically identify each concept mention in a tweet
EDU 2: and link it to a concept referent in a knowledge base
EDU 3: ( e.g. , wikipedia ) .
EDU 4: due to the shortness of a tweet ,
EDU 5: a collective inference model
EDU 6: incorporating global evidence from multiple mentions and concepts
EDU 7: is more appropriate than a noncollecitve approach
EDU 8: which links each mention at a time .
EDU 9: in addition , it is challenging to generate sufficient high quality labeled data for supervised models with low cost .
EDU 10: to tackle these challenges ,
EDU 11: we propose a novel semi-supervised graph regularization model
EDU 12: to incorporate both local and global evidence from multiple tweets through three fine-grained relations .
EDU 13: in order to identify semantically-related mentions for collective inference ,
EDU 14: we detect meta path-based semantic relations through social networks .
EDU 15: compared to the state-of-the-art supervised model
EDU 16: trained from 000 % labeled data ,
EDU 17: our proposed approach achieves comparable performance with 00 % labeled data and obtains 0 % absolute f0 gain with 00 % labeled data .
EDU 0:
EDU 1: wikification for tweets aims to automatically identify each concept mention in a tweet
EDU 2: and link it to a concept referent in a knowledge base
EDU 3: ( e.g. , wikipedia ) .
EDU 4: due to the shortness of a tweet ,
EDU 5: a collective inference model
EDU 6: incorporating global evidence from multiple mentions and concepts
EDU 7: is more appropriate than a noncollecitve approach
EDU 8: which links each mention at a time .
EDU 9: in addition , it is challenging to generate sufficient high quality labeled data for supervised models with low cost .
EDU 10: to tackle these challenges ,
EDU 11: we propose a novel semi-supervised graph regularization model
EDU 12: to incorporate both local and global evidence from multiple tweets through three fine-grained relations .
EDU 13: in order to identify semantically-related mentions for collective inference ,
EDU 14: we detect meta path-based semantic relations through social networks .
EDU 15: compared to the state-of-the-art supervised model
EDU 16: trained from 000 % labeled data ,
EDU 17: our proposed approach achieves comparable performance with 00 % labeled data and obtains 0 % absolute f0 gain with 00 % labeled data .
EDU 0:
EDU 1: in order to extract entities of a fine-grained category from semi-structured data in web pages ,
EDU 2: existing information extraction systems rely on seed examples or redundancy across multiple web pages .
EDU 3: in this paper , we consider a new zero-shot learning task of extracting entities
EDU 4: specified by a natural language query
EDU 5: ( in place of seeds )
EDU 6: given only a single web page .
EDU 7: our approach defines a log-linear model over latent extraction predicates ,
EDU 8: which select lists of entities from the web page .
EDU 9: the main challenge is to define features on widely varying candidate entity lists .
EDU 10: we tackle this
EDU 11: by abstracting list elements
EDU 12: and using aggregate statistics to define features .
EDU 13: finally , we created a new dataset of diverse queries and web pages ,
EDU 14: and show
EDU 15: that our system achieves significantly better accuracy than a natural baseline .
EDU 0:
EDU 1: in order to extract entities of a fine-grained category from semi-structured data in web pages ,
EDU 2: existing information extraction systems rely on seed examples or redundancy across multiple web pages .
EDU 3: in this paper , we consider a new zero-shot learning task of extracting entities
EDU 4: specified by a natural language query
EDU 5: ( in place of seeds )
EDU 6: given only a single web page .
EDU 7: our approach defines a log-linear model over latent extraction predicates ,
EDU 8: which select lists of entities from the web page .
EDU 9: the main challenge is to define features on widely varying candidate entity lists .
EDU 10: we tackle this
EDU 11: by abstracting list elements
EDU 12: and using aggregate statistics to define features .
EDU 13: finally , we created a new dataset of diverse queries and web pages ,
EDU 14: and show
EDU 15: that our system achieves significantly better accuracy than a natural baseline .
EDU 0:
EDU 1: in order to extract entities of a fine-grained category from semi-structured data in web pages ,
EDU 2: existing information extraction systems rely on seed examples or redundancy across multiple web pages .
EDU 3: in this paper , we consider a new zero-shot learning task of extracting entities
EDU 4: specified by a natural language query
EDU 5: ( in place of seeds )
EDU 6: given only a single web page .
EDU 7: our approach defines a log-linear model over latent extraction predicates ,
EDU 8: which select lists of entities from the web page .
EDU 9: the main challenge is to define features on widely varying candidate entity lists .
EDU 10: we tackle this
EDU 11: by abstracting list elements
EDU 12: and using aggregate statistics to define features .
EDU 13: finally , we created a new dataset of diverse queries and web pages ,
EDU 14: and show
EDU 15: that our system achieves significantly better accuracy than a natural baseline .
EDU 0:
EDU 1: we present an incremental joint framework
EDU 2: to simultaneously extract entity mentions and relations
EDU 3: using structured perceptron with efficient beam-search .
EDU 4: a segment-based decoder
EDU 5: based on the idea of semi-markov chain
EDU 6: is adopted to the new framework
EDU 7: as opposed to traditional token-based tagging .
EDU 8: in addition , by virtue of the inexact search ,
EDU 9: we developed a number of new and effective global features as soft constraints
EDU 10: to capture the inter-dependency among entity mentions and relations .
EDU 11: experiments on automatic content extraction ( ace ) corpora demonstrate
EDU 12: that our joint model significantly outperforms a strong pipelined baseline ,
EDU 13: which attains better performance than the best-reported end-to-end system .
EDU 0:
EDU 1: we present an incremental joint framework
EDU 2: to simultaneously extract entity mentions and relations
EDU 3: using structured perceptron with efficient beam-search .
EDU 4: a segment-based decoder
EDU 5: based on the idea of semi-markov chain
EDU 6: is adopted to the new framework
EDU 7: as opposed to traditional token-based tagging .
EDU 8: in addition , by virtue of the inexact search ,
EDU 9: we developed a number of new and effective global features as soft constraints
EDU 10: to capture the inter-dependency among entity mentions and relations .
EDU 11: experiments on automatic content extraction ( ace ) corpora demonstrate
EDU 12: that our joint model significantly outperforms a strong pipelined baseline ,
EDU 13: which attains better performance than the best-reported end-to-end system .
EDU 0:
EDU 1: we present an incremental joint framework
EDU 2: to simultaneously extract entity mentions and relations
EDU 3: using structured perceptron with efficient beam-search .
EDU 4: a segment-based decoder
EDU 5: based on the idea of semi-markov chain
EDU 6: is adopted to the new framework
EDU 7: as opposed to traditional token-based tagging .
EDU 8: in addition , by virtue of the inexact search ,
EDU 9: we developed a number of new and effective global features as soft constraints
EDU 10: to capture the inter-dependency among entity mentions and relations .
EDU 11: experiments on automatic content extraction ( ace ) corpora demonstrate
EDU 12: that our joint model significantly outperforms a strong pipelined baseline ,
EDU 13: which attains better performance than the best-reported end-to-end system .
EDU 0:
EDU 1: we investigate whether parsers can be used for self-monitoring in surface realization
EDU 2: in order to avoid egregious errors
EDU 3: involving "vicious" ambiguities ,
EDU 4: namely those
EDU 5: where the intended interpretation fails to be considerably more likely than alternative ones .
EDU 6: using parse accuracy in a simple reranking strategy for selfmonitoring ,
EDU 7: we find
EDU 8: that with a state-of-the-art averaged perceptron realization ranking model ,
EDU 9: bleu scores cannot be improved
EDU 10: with any of the well-known treebank parsers
EDU 11: we tested ,
EDU 12: since these parsers too often make errors
EDU 13: that human readers would be unlikely to make .
EDU 14: however , by using an svm ranker
EDU 15: to combine the realizer's model score together with features from multiple parsers ,
EDU 16: including ones designed to make the ranker more robust to parsing mistakes ,
EDU 17: we show
EDU 18: that significant increases in bleu scores can be achieved .
EDU 19: moreover , via a targeted manual analysis ,
EDU 20: we demonstrate
EDU 21: that the svm reranker frequently manages to avoid vicious ambiguities ,
EDU 22: while its ranking errors tend to affect fluency much more often than adequacy .
EDU 0:
EDU 1: we investigate whether parsers can be used for self-monitoring in surface realization
EDU 2: in order to avoid egregious errors
EDU 3: involving "vicious" ambiguities ,
EDU 4: namely those
EDU 5: where the intended interpretation fails to be considerably more likely than alternative ones .
EDU 6: using parse accuracy in a simple reranking strategy for selfmonitoring ,
EDU 7: we find
EDU 8: that with a state-of-the-art averaged perceptron realization ranking model ,
EDU 9: bleu scores cannot be improved
EDU 10: with any of the well-known treebank parsers
EDU 11: we tested ,
EDU 12: since these parsers too often make errors
EDU 13: that human readers would be unlikely to make .
EDU 14: however , by using an svm ranker
EDU 15: to combine the realizer's model score together with features from multiple parsers ,
EDU 16: including ones designed to make the ranker more robust to parsing mistakes ,
EDU 17: we show
EDU 18: that significant increases in bleu scores can be achieved .
EDU 19: moreover , via a targeted manual analysis ,
EDU 20: we demonstrate
EDU 21: that the svm reranker frequently manages to avoid vicious ambiguities ,
EDU 22: while its ranking errors tend to affect fluency much more often than adequacy .
EDU 0:
EDU 1: we investigate whether parsers can be used for self-monitoring in surface realization
EDU 2: in order to avoid egregious errors
EDU 3: involving "vicious" ambiguities ,
EDU 4: namely those
EDU 5: where the intended interpretation fails to be considerably more likely than alternative ones .
EDU 6: using parse accuracy in a simple reranking strategy for selfmonitoring ,
EDU 7: we find
EDU 8: that with a state-of-the-art averaged perceptron realization ranking model ,
EDU 9: bleu scores cannot be improved
EDU 10: with any of the well-known treebank parsers
EDU 11: we tested ,
EDU 12: since these parsers too often make errors
EDU 13: that human readers would be unlikely to make .
EDU 14: however , by using an svm ranker
EDU 15: to combine the realizer's model score together with features from multiple parsers ,
EDU 16: including ones designed to make the ranker more robust to parsing mistakes ,
EDU 17: we show
EDU 18: that significant increases in bleu scores can be achieved .
EDU 19: moreover , via a targeted manual analysis ,
EDU 20: we demonstrate
EDU 21: that the svm reranker frequently manages to avoid vicious ambiguities ,
EDU 22: while its ranking errors tend to affect fluency much more often than adequacy .
EDU 0:
EDU 1: we present a simple , data-driven approach to generation from knowledge bases ( kb ) .
EDU 2: a key feature of this approach is that grammar induction is driven by the extended domain of locality principle of tag ( tree adjoining grammar ) ;
EDU 3: and that it takes into account both syntactic and semantic information .
EDU 4: the resulting extracted tag includes a unification based semantics
EDU 5: and can be used by an existing surface realiser
EDU 6: to generate sentences from kb data .
EDU 7: experimental evaluation on the kbgen data shows
EDU 8: that our model outperforms a data-driven generate-and-rank approach
EDU 9: based on an automatically induced probabilistic grammar ;
EDU 10: and is comparable with a handcrafted symbolic approach .
EDU 0:
EDU 1: we present a simple , data-driven approach to generation from knowledge bases ( kb ) .
EDU 2: a key feature of this approach is that grammar induction is driven by the extended domain of locality principle of tag ( tree adjoining grammar ) ;
EDU 3: and that it takes into account both syntactic and semantic information .
EDU 4: the resulting extracted tag includes a unification based semantics
EDU 5: and can be used by an existing surface realiser
EDU 6: to generate sentences from kb data .
EDU 7: experimental evaluation on the kbgen data shows
EDU 8: that our model outperforms a data-driven generate-and-rank approach
EDU 9: based on an automatically induced probabilistic grammar ;
EDU 10: and is comparable with a handcrafted symbolic approach .
EDU 0:
EDU 1: we present a simple , data-driven approach to generation from knowledge bases ( kb ) .
EDU 2: a key feature of this approach is that grammar induction is driven by the extended domain of locality principle of tag ( tree adjoining grammar ) ;
EDU 3: and that it takes into account both syntactic and semantic information .
EDU 4: the resulting extracted tag includes a unification based semantics
EDU 5: and can be used by an existing surface realiser
EDU 6: to generate sentences from kb data .
EDU 7: experimental evaluation on the kbgen data shows
EDU 8: that our model outperforms a data-driven generate-and-rank approach
EDU 9: based on an automatically induced probabilistic grammar ;
EDU 10: and is comparable with a handcrafted symbolic approach .
EDU 0:
EDU 1: we present a hybrid approach to sentence simplification
EDU 2: which combines deep semantics and monolingual machine translation
EDU 3: to derive simple sentences from complex ones .
EDU 4: the approach differs from previous work in two main ways .
EDU 5: first , it is semantic based
EDU 6: in that it takes as input a deep semantic representation rather than e.g. , a sentence or a parse tree .
EDU 7: second , it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering .
EDU 8: when compared against current state of the art methods ,
EDU 9: our model yields significantly simpler output
EDU 10: that is both grammatical and meaning preserving .
EDU 0:
EDU 1: we present a hybrid approach to sentence simplification
EDU 2: which combines deep semantics and monolingual machine translation
EDU 3: to derive simple sentences from complex ones .
EDU 4: the approach differs from previous work in two main ways .
EDU 5: first , it is semantic based
EDU 6: in that it takes as input a deep semantic representation rather than e.g. , a sentence or a parse tree .
EDU 7: second , it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering .
EDU 8: when compared against current state of the art methods ,
EDU 9: our model yields significantly simpler output
EDU 10: that is both grammatical and meaning preserving .
EDU 0:
EDU 1: we present a hybrid approach to sentence simplification
EDU 2: which combines deep semantics and monolingual machine translation
EDU 3: to derive simple sentences from complex ones .
EDU 4: the approach differs from previous work in two main ways .
EDU 5: first , it is semantic based
EDU 6: in that it takes as input a deep semantic representation rather than e.g. , a sentence or a parse tree .
EDU 7: second , it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering .
EDU 8: when compared against current state of the art methods ,
EDU 9: our model yields significantly simpler output
EDU 10: that is both grammatical and meaning preserving .
EDU 0:
EDU 1: this paper is concerned with building linguistic resources and statistical parsers for deep grammatical relation ( gr ) analysis of chinese texts .
EDU 2: a set of linguistic rules is defined
EDU 3: to explore implicit phrase structural information
EDU 4: and thus build high-quality gr annotations
EDU 5: that are represented as general directed dependency graphs .
EDU 6: the reliability of this linguistically-motivated gr extraction procedure is highlighted by manual evaluation .
EDU 7: based on the converted corpus ,
EDU 8: we study transition-based , data-driven models for gr parsing .
EDU 9: we present a novel transition system
EDU 10: which suits gr graphs better than existing systems .
EDU 11: the key idea is to introduce a new type of transition
EDU 12: that reorders top k elements in the memory module .
EDU 13: evaluation gauges
EDU 14: how successful gr parsing for chinese can be by applying data-driven models .
EDU 0:
EDU 1: this paper is concerned with building linguistic resources and statistical parsers for deep grammatical relation ( gr ) analysis of chinese texts .
EDU 2: a set of linguistic rules is defined
EDU 3: to explore implicit phrase structural information
EDU 4: and thus build high-quality gr annotations
EDU 5: that are represented as general directed dependency graphs .
EDU 6: the reliability of this linguistically-motivated gr extraction procedure is highlighted by manual evaluation .
EDU 7: based on the converted corpus ,
EDU 8: we study transition-based , data-driven models for gr parsing .
EDU 9: we present a novel transition system
EDU 10: which suits gr graphs better than existing systems .
EDU 11: the key idea is to introduce a new type of transition
EDU 12: that reorders top k elements in the memory module .
EDU 13: evaluation gauges
EDU 14: how successful gr parsing for chinese can be by applying data-driven models .
EDU 0:
EDU 1: this paper is concerned with building linguistic resources and statistical parsers for deep grammatical relation ( gr ) analysis of chinese texts .
EDU 2: a set of linguistic rules is defined
EDU 3: to explore implicit phrase structural information
EDU 4: and thus build high-quality gr annotations
EDU 5: that are represented as general directed dependency graphs .
EDU 6: the reliability of this linguistically-motivated gr extraction procedure is highlighted by manual evaluation .
EDU 7: based on the converted corpus ,
EDU 8: we study transition-based , data-driven models for gr parsing .
EDU 9: we present a novel transition system
EDU 10: which suits gr graphs better than existing systems .
EDU 11: the key idea is to introduce a new type of transition
EDU 12: that reorders top k elements in the memory module .
EDU 13: evaluation gauges
EDU 14: how successful gr parsing for chinese can be by applying data-driven models .
EDU 0:
EDU 1: this paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level ,
EDU 2: referred to as ambiguity-aware ensemble training .
EDU 3: instead of only using 0-best parse trees in previous work ,
EDU 4: our core idea is to utilize parse forest ( ambiguous labelings )
EDU 5: to combine multiple 0-best parse trees
EDU 6: generated from diverse parsers on unlabeled data .
EDU 7: with a conditional random field based probabilistic dependency parser ,
EDU 8: our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings .
EDU 9: this framework offers two promising advantages .
EDU 10: 0 ) ambiguity encoded in parse forests compromises noise in 0-best parse trees .
EDU 11: during training ,
EDU 12: the parser is aware of these ambiguous structures ,
EDU 13: and has the flexibility
EDU 14: to distribute probability mass to its preferred parse trees
EDU 15: as long as the likelihood improves .
EDU 16: 0 ) diverse syntactic structures
EDU 17: produced by different parsers
EDU 18: can be naturally compiled into forest ,
EDU 19: offering complementary strength to our single-view parser .
EDU 20: experimental results on benchmark data show
EDU 21: that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods , such as self-training , co-training and tri-training .
EDU 0:
EDU 1: this paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level ,
EDU 2: referred to as ambiguity-aware ensemble training .
EDU 3: instead of only using 0-best parse trees in previous work ,
EDU 4: our core idea is to utilize parse forest ( ambiguous labelings )
EDU 5: to combine multiple 0-best parse trees
EDU 6: generated from diverse parsers on unlabeled data .
EDU 7: with a conditional random field based probabilistic dependency parser ,
EDU 8: our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings .
EDU 9: this framework offers two promising advantages .
EDU 10: 0 ) ambiguity encoded in parse forests compromises noise in 0-best parse trees .
EDU 11: during training ,
EDU 12: the parser is aware of these ambiguous structures ,
EDU 13: and has the flexibility
EDU 14: to distribute probability mass to its preferred parse trees
EDU 15: as long as the likelihood improves .
EDU 16: 0 ) diverse syntactic structures
EDU 17: produced by different parsers
EDU 18: can be naturally compiled into forest ,
EDU 19: offering complementary strength to our single-view parser .
EDU 20: experimental results on benchmark data show
EDU 21: that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods , such as self-training , co-training and tri-training .
EDU 0:
EDU 1: this paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level ,
EDU 2: referred to as ambiguity-aware ensemble training .
EDU 3: instead of only using 0-best parse trees in previous work ,
EDU 4: our core idea is to utilize parse forest ( ambiguous labelings )
EDU 5: to combine multiple 0-best parse trees
EDU 6: generated from diverse parsers on unlabeled data .
EDU 7: with a conditional random field based probabilistic dependency parser ,
EDU 8: our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings .
EDU 9: this framework offers two promising advantages .
EDU 10: 0 ) ambiguity encoded in parse forests compromises noise in 0-best parse trees .
EDU 11: during training ,
EDU 12: the parser is aware of these ambiguous structures ,
EDU 13: and has the flexibility
EDU 14: to distribute probability mass to its preferred parse trees
EDU 15: as long as the likelihood improves .
EDU 16: 0 ) diverse syntactic structures
EDU 17: produced by different parsers
EDU 18: can be naturally compiled into forest ,
EDU 19: offering complementary strength to our single-view parser .
EDU 20: experimental results on benchmark data show
EDU 21: that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods , such as self-training , co-training and tri-training .
EDU 0:
EDU 1: lexical resource alignment has been an active field of research over the last decade .
EDU 2: however , prior methods
EDU 3: for aligning lexical resources
EDU 4: have been either specific to a particular pair of resources ,
EDU 5: or heavily dependent on the availability of hand-crafted alignment data for the pair of resources
EDU 6: to be aligned .
EDU 7: here we present a unified approach
EDU 8: that can be applied to an arbitrary pair of lexical resources ,
EDU 9: including machine-readable dictionaries with no network structure .
EDU 10: our approach leverages a similarity measure
EDU 11: that enables the structural comparison of senses across lexical resources ,
EDU 12: achieving state-of-the-art performance on the task
EDU 13: of aligning wordnet to three different collaborative resources :
EDU 14: wikipedia , wiktionary and omegawiki .
EDU 0:
EDU 1: lexical resource alignment has been an active field of research over the last decade .
EDU 2: however , prior methods
EDU 3: for aligning lexical resources
EDU 4: have been either specific to a particular pair of resources ,
EDU 5: or heavily dependent on the availability of hand-crafted alignment data for the pair of resources
EDU 6: to be aligned .
EDU 7: here we present a unified approach
EDU 8: that can be applied to an arbitrary pair of lexical resources ,
EDU 9: including machine-readable dictionaries with no network structure .
EDU 10: our approach leverages a similarity measure
EDU 11: that enables the structural comparison of senses across lexical resources ,
EDU 12: achieving state-of-the-art performance on the task
EDU 13: of aligning wordnet to three different collaborative resources :
EDU 14: wikipedia , wiktionary and omegawiki .
EDU 0:
EDU 1: lexical resource alignment has been an active field of research over the last decade .
EDU 2: however , prior methods
EDU 3: for aligning lexical resources
EDU 4: have been either specific to a particular pair of resources ,
EDU 5: or heavily dependent on the availability of hand-crafted alignment data for the pair of resources
EDU 6: to be aligned .
EDU 7: here we present a unified approach
EDU 8: that can be applied to an arbitrary pair of lexical resources ,
EDU 9: including machine-readable dictionaries with no network structure .
EDU 10: our approach leverages a similarity measure
EDU 11: that enables the structural comparison of senses across lexical resources ,
EDU 12: achieving state-of-the-art performance on the task
EDU 13: of aligning wordnet to three different collaborative resources :
EDU 14: wikipedia , wiktionary and omegawiki .
EDU 0:
EDU 1: using distributional analysis methods to compute semantic proximity links between words has become commonplace in nlp .
EDU 2: the resulting relations are often noisy or difficult to interpret in general.
EDU 3: this paper focuses on the issues
EDU 4: of evaluating a distributional resource
EDU 5: and filtering the relations
EDU 6: it contains ,
EDU 7: but instead of considering it in abstract ,
EDU 8: we focus on pairs of words in context .
EDU 9: in a discourse , we are interested in knowing if the semantic link between two items is a by-product of textual coherence or is irrelevant .
EDU 10: we first set up a human annotation of semantic links
EDU 11: with or without contextual information
EDU 12: to show the importance of the textual context
EDU 13: in evaluating the relevance of semantic similarity ,
EDU 14: and to assess the prevalence of actual semantic relations between word tokens .
EDU 15: we then built an experiment
EDU 16: to automatically predict this relevance ,
EDU 17: evaluated on the reliable reference data set
EDU 18: which was the outcome of the first annotation .
EDU 19: we show
EDU 20: that in-document information greatly improve the prediction
EDU 21: made by the similarity level alone .
EDU 0:
EDU 1: using distributional analysis methods to compute semantic proximity links between words has become commonplace in nlp .
EDU 2: the resulting relations are often noisy or difficult to interpret in general.
EDU 3: this paper focuses on the issues
EDU 4: of evaluating a distributional resource
EDU 5: and filtering the relations
EDU 6: it contains ,
EDU 7: but instead of considering it in abstract ,
EDU 8: we focus on pairs of words in context .
EDU 9: in a discourse , we are interested in knowing if the semantic link between two items is a by-product of textual coherence or is irrelevant .
EDU 10: we first set up a human annotation of semantic links
EDU 11: with or without contextual information
EDU 12: to show the importance of the textual context
EDU 13: in evaluating the relevance of semantic similarity ,
EDU 14: and to assess the prevalence of actual semantic relations between word tokens .
EDU 15: we then built an experiment
EDU 16: to automatically predict this relevance ,
EDU 17: evaluated on the reliable reference data set
EDU 18: which was the outcome of the first annotation .
EDU 19: we show
EDU 20: that in-document information greatly improve the prediction
EDU 21: made by the similarity level alone .
EDU 0:
EDU 1: using distributional analysis methods to compute semantic proximity links between words has become commonplace in nlp .
EDU 2: the resulting relations are often noisy or difficult to interpret in general.
EDU 3: this paper focuses on the issues
EDU 4: of evaluating a distributional resource
EDU 5: and filtering the relations
EDU 6: it contains ,
EDU 7: but instead of considering it in abstract ,
EDU 8: we focus on pairs of words in context .
EDU 9: in a discourse , we are interested in knowing if the semantic link between two items is a by-product of textual coherence or is irrelevant .
EDU 10: we first set up a human annotation of semantic links
EDU 11: with or without contextual information
EDU 12: to show the importance of the textual context
EDU 13: in evaluating the relevance of semantic similarity ,
EDU 14: and to assess the prevalence of actual semantic relations between word tokens .
EDU 15: we then built an experiment
EDU 16: to automatically predict this relevance ,
EDU 17: evaluated on the reliable reference data set
EDU 18: which was the outcome of the first annotation .
EDU 19: we show
EDU 20: that in-document information greatly improve the prediction
EDU 21: made by the similarity level alone .
EDU 0:
EDU 1: vector space models ( vsms ) represent word meanings as points in a high dimensional space .
EDU 2: vsms are typically created
EDU 3: using a large text corpora ,
EDU 4: and so represent word semantics
EDU 5: as observed in text .
EDU 6: we present a new algorithm ( jnnse )
EDU 7: that can incorporate a measure of semantics
EDU 8: not previously used
EDU 9: to create vsms :
EDU 10: brain activation data recorded
EDU 11: while people read words .
EDU 12: the resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data
EDU 13: to give a more complete representation of semantics .
EDU 14: evaluations show
EDU 15: that the model 0 ) matches a behavioral measure of semantics more closely ,
EDU 16: 0 ) can be used to predict corpus data for unseen words
EDU 17: and 0 ) has predictive power
EDU 18: that generalizes across brain imaging technologies and across subjects .
EDU 19: we believe
EDU 20: that the model is thus a more faithful representation of mental vocabularies .
EDU 0:
EDU 1: vector space models ( vsms ) represent word meanings as points in a high dimensional space .
EDU 2: vsms are typically created
EDU 3: using a large text corpora ,
EDU 4: and so represent word semantics
EDU 5: as observed in text .
EDU 6: we present a new algorithm ( jnnse )
EDU 7: that can incorporate a measure of semantics
EDU 8: not previously used
EDU 9: to create vsms :
EDU 10: brain activation data recorded
EDU 11: while people read words .
EDU 12: the resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data
EDU 13: to give a more complete representation of semantics .
EDU 14: evaluations show
EDU 15: that the model 0 ) matches a behavioral measure of semantics more closely ,
EDU 16: 0 ) can be used to predict corpus data for unseen words
EDU 17: and 0 ) has predictive power
EDU 18: that generalizes across brain imaging technologies and across subjects .
EDU 19: we believe
EDU 20: that the model is thus a more faithful representation of mental vocabularies .
EDU 0:
EDU 1: vector space models ( vsms ) represent word meanings as points in a high dimensional space .
EDU 2: vsms are typically created
EDU 3: using a large text corpora ,
EDU 4: and so represent word semantics
EDU 5: as observed in text .
EDU 6: we present a new algorithm ( jnnse )
EDU 7: that can incorporate a measure of semantics
EDU 8: not previously used
EDU 9: to create vsms :
EDU 10: brain activation data recorded
EDU 11: while people read words .
EDU 12: the resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data
EDU 13: to give a more complete representation of semantics .
EDU 14: evaluations show
EDU 15: that the model 0 ) matches a behavioral measure of semantics more closely ,
EDU 16: 0 ) can be used to predict corpus data for unseen words
EDU 17: and 0 ) has predictive power
EDU 18: that generalizes across brain imaging technologies and across subjects .
EDU 19: we believe
EDU 20: that the model is thus a more faithful representation of mental vocabularies .
EDU 0:
EDU 1: we use single-agent and multi-agent reinforcement learning ( rl )
EDU 2: for learning dialogue policies in a resource allocation negotiation scenario .
EDU 3: two agents learn concurrently
EDU 4: by interacting with each other
EDU 5: without any need for simulated users ( sus )
EDU 6: to train against
EDU 7: or corpora
EDU 8: to learn from .
EDU 9: in particular , we compare the q-learning , policy hill-climbing ( phc ) and win or learn fast policy hill-climbing ( phc-wolf ) algorithms ,
EDU 10: varying the scenario complexity ( state space size ) , the number of training episodes , the learning rate , and the exploration rate .
EDU 11: our results show
EDU 12: that generally q-learning fails to converge
EDU 13: whereas phc and phc-wolf always converge and perform similarly .
EDU 14: we also show
EDU 15: that very high gradually decreasing exploration rates are required for convergence .
EDU 16: we conclude
EDU 17: that multi-agent rl of dialogue policies is a promising alternative
EDU 18: to using single-agent rl and sus
EDU 19: or learning directly from corpora .
EDU 0:
EDU 1: we use single-agent and multi-agent reinforcement learning ( rl )
EDU 2: for learning dialogue policies in a resource allocation negotiation scenario .
EDU 3: two agents learn concurrently
EDU 4: by interacting with each other
EDU 5: without any need for simulated users ( sus )
EDU 6: to train against
EDU 7: or corpora
EDU 8: to learn from .
EDU 9: in particular , we compare the q-learning , policy hill-climbing ( phc ) and win or learn fast policy hill-climbing ( phc-wolf ) algorithms ,
EDU 10: varying the scenario complexity ( state space size ) , the number of training episodes , the learning rate , and the exploration rate .
EDU 11: our results show
EDU 12: that generally q-learning fails to converge
EDU 13: whereas phc and phc-wolf always converge and perform similarly .
EDU 14: we also show
EDU 15: that very high gradually decreasing exploration rates are required for convergence .
EDU 16: we conclude
EDU 17: that multi-agent rl of dialogue policies is a promising alternative
EDU 18: to using single-agent rl and sus
EDU 19: or learning directly from corpora .
EDU 0:
EDU 1: we use single-agent and multi-agent reinforcement learning ( rl )
EDU 2: for learning dialogue policies in a resource allocation negotiation scenario .
EDU 3: two agents learn concurrently
EDU 4: by interacting with each other
EDU 5: without any need for simulated users ( sus )
EDU 6: to train against
EDU 7: or corpora
EDU 8: to learn from .
EDU 9: in particular , we compare the q-learning , policy hill-climbing ( phc ) and win or learn fast policy hill-climbing ( phc-wolf ) algorithms ,
EDU 10: varying the scenario complexity ( state space size ) , the number of training episodes , the learning rate , and the exploration rate .
EDU 11: our results show
EDU 12: that generally q-learning fails to converge
EDU 13: whereas phc and phc-wolf always converge and perform similarly .
EDU 14: we also show
EDU 15: that very high gradually decreasing exploration rates are required for convergence .
EDU 16: we conclude
EDU 17: that multi-agent rl of dialogue policies is a promising alternative
EDU 18: to using single-agent rl and sus
EDU 19: or learning directly from corpora .
EDU 0:
EDU 1: text-level discourse parsing remains a challenge .
EDU 2: the current state-of-the-art overall accuracy in relation assignment is 00.00 % ,
EDU 3: achieved by joty et al. ( 0000 ) .
EDU 4: however , their model has a high order of time complexity ,
EDU 5: and thus cannot be applied in practice .
EDU 6: in this work , we develop a much faster model
EDU 7: whose time complexity is linear in the number of sentences .
EDU 8: our model adopts a greedy bottom-up approach ,
EDU 9: with two linear-chain crfs
EDU 10: applied in cascade as local classifiers .
EDU 11: to enhance the accuracy of the pipeline ,
EDU 12: we add additional constraints in the viterbi decoding of the first crf .
EDU 13: in addition to efficiency , our parser also significantly outperforms the state of the art .
EDU 14: moreover , our novel approach of post-editing ,
EDU 15: which modifies a fully-built tree by considering information from constituents on upper levels ,
EDU 16: can further improve the accuracy .
EDU 0:
EDU 1: text-level discourse parsing remains a challenge .
EDU 2: the current state-of-the-art overall accuracy in relation assignment is 00.00 % ,
EDU 3: achieved by joty et al. ( 0000 ) .
EDU 4: however , their model has a high order of time complexity ,
EDU 5: and thus cannot be applied in practice .
EDU 6: in this work , we develop a much faster model
EDU 7: whose time complexity is linear in the number of sentences .
EDU 8: our model adopts a greedy bottom-up approach ,
EDU 9: with two linear-chain crfs
EDU 10: applied in cascade as local classifiers .
EDU 11: to enhance the accuracy of the pipeline ,
EDU 12: we add additional constraints in the viterbi decoding of the first crf .
EDU 13: in addition to efficiency , our parser also significantly outperforms the state of the art .
EDU 14: moreover , our novel approach of post-editing ,
EDU 15: which modifies a fully-built tree by considering information from constituents on upper levels ,
EDU 16: can further improve the accuracy .
EDU 0:
EDU 1: text-level discourse parsing remains a challenge .
EDU 2: the current state-of-the-art overall accuracy in relation assignment is 00.00 % ,
EDU 3: achieved by joty et al. ( 0000 ) .
EDU 4: however , their model has a high order of time complexity ,
EDU 5: and thus cannot be applied in practice .
EDU 6: in this work , we develop a much faster model
EDU 7: whose time complexity is linear in the number of sentences .
EDU 8: our model adopts a greedy bottom-up approach ,
EDU 9: with two linear-chain crfs
EDU 10: applied in cascade as local classifiers .
EDU 11: to enhance the accuracy of the pipeline ,
EDU 12: we add additional constraints in the viterbi decoding of the first crf .
EDU 13: in addition to efficiency , our parser also significantly outperforms the state of the art .
EDU 14: moreover , our novel approach of post-editing ,
EDU 15: which modifies a fully-built tree by considering information from constituents on upper levels ,
EDU 16: can further improve the accuracy .
EDU 0:
EDU 1: negative expressions are common in natural language text
EDU 2: and play a critical role in information extraction .
EDU 3: however , the performances of current systems are far from satisfaction ,
EDU 4: largely due to its focus on intra-sentence information and its failure
EDU 5: to consider inter-sentence information .
EDU 6: in this paper , we propose a graph model
EDU 7: to enrich intra-sentence features with inter-sentence features from both lexical and topic perspectives .
EDU 8: evaluation on the * sem 0000 shared task corpus indicates the usefulness of contextual discourse information in negation focus identification
EDU 9: and justifies the effectiveness of our graph model
EDU 10: in capturing such global information .
EDU 0:
EDU 1: negative expressions are common in natural language text
EDU 2: and play a critical role in information extraction .
EDU 3: however , the performances of current systems are far from satisfaction ,
EDU 4: largely due to its focus on intra-sentence information and its failure
EDU 5: to consider inter-sentence information .
EDU 6: in this paper , we propose a graph model
EDU 7: to enrich intra-sentence features with inter-sentence features from both lexical and topic perspectives .
EDU 8: evaluation on the * sem 0000 shared task corpus indicates the usefulness of contextual discourse information in negation focus identification
EDU 9: and justifies the effectiveness of our graph model
EDU 10: in capturing such global information .
EDU 0:
EDU 1: negative expressions are common in natural language text
EDU 2: and play a critical role in information extraction .
EDU 3: however , the performances of current systems are far from satisfaction ,
EDU 4: largely due to its focus on intra-sentence information and its failure
EDU 5: to consider inter-sentence information .
EDU 6: in this paper , we propose a graph model
EDU 7: to enrich intra-sentence features with inter-sentence features from both lexical and topic perspectives .
EDU 8: evaluation on the * sem 0000 shared task corpus indicates the usefulness of contextual discourse information in negation focus identification
EDU 9: and justifies the effectiveness of our graph model
EDU 10: in capturing such global information .
EDU 0:
EDU 1: automatic extraction of new words is an indispensable precursor to many nlp tasks
EDU 2: such as chinese word segmentation , named entity extraction , and sentiment analysis .
EDU 3: this paper aims at extracting new sentiment words from large-scale user-generated content .
EDU 4: we propose a fully unsupervised , purely data-driven framework for this purpose .
EDU 5: we design statistical measures respectively
EDU 6: to quantify the utility of a lexical pattern
EDU 7: and to measure the possibility of a word being a new word .
EDU 8: the method is almost free of linguistic resources ( except pos tags ) ,
EDU 9: and requires no elaborated linguistic rules .
EDU 10: we also demonstrate
EDU 11: how new sentiment word will benefit sentiment analysis .
EDU 12: experiment results demonstrate the effectiveness of the proposed method .
EDU 0:
EDU 1: automatic extraction of new words is an indispensable precursor to many nlp tasks
EDU 2: such as chinese word segmentation , named entity extraction , and sentiment analysis .
EDU 3: this paper aims at extracting new sentiment words from large-scale user-generated content .
EDU 4: we propose a fully unsupervised , purely data-driven framework for this purpose .
EDU 5: we design statistical measures respectively
EDU 6: to quantify the utility of a lexical pattern
EDU 7: and to measure the possibility of a word being a new word .
EDU 8: the method is almost free of linguistic resources ( except pos tags ) ,
EDU 9: and requires no elaborated linguistic rules .
EDU 10: we also demonstrate
EDU 11: how new sentiment word will benefit sentiment analysis .
EDU 12: experiment results demonstrate the effectiveness of the proposed method .
EDU 0:
EDU 1: automatic extraction of new words is an indispensable precursor to many nlp tasks
EDU 2: such as chinese word segmentation , named entity extraction , and sentiment analysis .
EDU 3: this paper aims at extracting new sentiment words from large-scale user-generated content .
EDU 4: we propose a fully unsupervised , purely data-driven framework for this purpose .
EDU 5: we design statistical measures respectively
EDU 6: to quantify the utility of a lexical pattern
EDU 7: and to measure the possibility of a word being a new word .
EDU 8: the method is almost free of linguistic resources ( except pos tags ) ,
EDU 9: and requires no elaborated linguistic rules .
EDU 10: we also demonstrate
EDU 11: how new sentiment word will benefit sentiment analysis .
EDU 12: experiment results demonstrate the effectiveness of the proposed method .
EDU 0:
EDU 1: the sentiment
EDU 2: captured in opinionated text
EDU 3: provides interesting and valuable information for social media services .
EDU 4: however , due to the complexity and diversity of linguistic representations ,
EDU 5: it is challenging to build a framework
EDU 6: that accurately extracts such sentiment .
EDU 7: we propose a semi-supervised framework
EDU 8: for generating a domain-specific sentiment lexicon
EDU 9: and inferring sentiments at the segment level .
EDU 10: our framework can greatly reduce the human effort
EDU 11: for building a domain-specific sentiment lexicon with high quality .
EDU 12: specifically , in our evaluation , working with just 00 manually labeled reviews ,
EDU 13: it generates a domain-specific sentiment lexicon
EDU 14: that yields weighted average f-measure gains of 0 % .
EDU 15: our sentiment classification model achieves approximately 0 % greater accuracy than a state-of-the-art approach
EDU 16: based on elementary discourse units .
EDU 0:
EDU 1: the sentiment
EDU 2: captured in opinionated text
EDU 3: provides interesting and valuable information for social media services .
EDU 4: however , due to the complexity and diversity of linguistic representations ,
EDU 5: it is challenging to build a framework
EDU 6: that accurately extracts such sentiment .
EDU 7: we propose a semi-supervised framework
EDU 8: for generating a domain-specific sentiment lexicon
EDU 9: and inferring sentiments at the segment level .
EDU 10: our framework can greatly reduce the human effort
EDU 11: for building a domain-specific sentiment lexicon with high quality .
EDU 12: specifically , in our evaluation , working with just 00 manually labeled reviews ,
EDU 13: it generates a domain-specific sentiment lexicon
EDU 14: that yields weighted average f-measure gains of 0 % .
EDU 15: our sentiment classification model achieves approximately 0 % greater accuracy than a state-of-the-art approach
EDU 16: based on elementary discourse units .
EDU 0:
EDU 1: we study the problem of generating an english sentence
EDU 2: given an underlying probabilistic grammar , a world and a communicative goal.
EDU 3: we model the generation problem as a markov decision process with a suitably defined reward function
EDU 4: that reflects the communicative goal.
EDU 5: we then use probabilistic planning to solve the mdp
EDU 6: and generate a sentence
EDU 7: that , with high probability , accomplishes the communicative goal.
EDU 8: we show empirically
EDU 9: that our approach can generate complex sentences with a speed
EDU 10: that generally matches or surpasses the state of the art .
EDU 11: further , we show
EDU 12: that our approach is anytime
EDU 13: and can handle complex communicative goals ,
EDU 14: including negated goals .
EDU 0:
EDU 1: we study the problem of generating an english sentence
EDU 2: given an underlying probabilistic grammar , a world and a communicative goal.
EDU 3: we model the generation problem as a markov decision process with a suitably defined reward function
EDU 4: that reflects the communicative goal.
EDU 5: we then use probabilistic planning to solve the mdp
EDU 6: and generate a sentence
EDU 7: that , with high probability , accomplishes the communicative goal.
EDU 8: we show empirically
EDU 9: that our approach can generate complex sentences with a speed
EDU 10: that generally matches or surpasses the state of the art .
EDU 11: further , we show
EDU 12: that our approach is anytime
EDU 13: and can handle complex communicative goals ,
EDU 14: including negated goals .
EDU 0:
EDU 1: a vast majority of l0 vocabulary acquisition occurs through incidental learning
EDU 2: during reading ( nation , 0000 ; schmitt et al. , 0000 ) .
EDU 3: we propose a probabilistic approach
EDU 4: to generating code-mixed text as an l0 technique
EDU 5: for increasing retention in adult lexical learning
EDU 6: through reading .
EDU 7: our model
EDU 8: that takes as input a bilingual dictionary and an english text ,
EDU 9: and generates a code-switched text
EDU 10: that optimizes a defined " learnability " metric
EDU 11: by constructing a factor graph over lexical mentions .
EDU 12: using an artificial language vocabulary ,
EDU 13: we evaluate a set of algorithms
EDU 14: for generating code-switched text automatically
EDU 15: by presenting it to mechanical turk subjects
EDU 16: and measuring recall in a sentence completion task .
EDU 0:
EDU 1: a vast majority of l0 vocabulary acquisition occurs through incidental learning
EDU 2: during reading ( nation , 0000 ; schmitt et al. , 0000 ) .
EDU 3: we propose a probabilistic approach
EDU 4: to generating code-mixed text as an l0 technique
EDU 5: for increasing retention in adult lexical learning
EDU 6: through reading .
EDU 7: our model
EDU 8: that takes as input a bilingual dictionary and an english text ,
EDU 9: and generates a code-switched text
EDU 10: that optimizes a defined " learnability " metric
EDU 11: by constructing a factor graph over lexical mentions .
EDU 12: using an artificial language vocabulary ,
EDU 13: we evaluate a set of algorithms
EDU 14: for generating code-switched text automatically
EDU 15: by presenting it to mechanical turk subjects
EDU 16: and measuring recall in a sentence completion task .
EDU 0:
EDU 1: chinese is an ancient hieroglyphic .
EDU 2: it is inattentive to structure .
EDU 3: therefore , segmenting and parsing chinese are more difficult and less accurate .
EDU 4: in this paper , we propose an omni-word feature and a soft constraint method for chinese relation extraction .
EDU 5: the omni-word feature uses every potential word in a sentence as lexicon feature ,
EDU 6: reducing errors
EDU 7: caused by word segmentation .
EDU 8: in order to utilize the structure information of a relation instance ,
EDU 9: we discuss
EDU 10: how soft constraint can be used
EDU 11: to capture the local dependency .
EDU 12: both omni-word feature and soft constraint make a better use of sentence information
EDU 13: and minimize the influences
EDU 14: caused by chinese word segmentation and parsing .
EDU 15: we test these methods on the ace 0000 rdc chinese corpus .
EDU 16: the results show a significant improvement in chinese relation extraction ,
EDU 17: outperforming other methods in f-score by 00 % in 0 relation types and 00 % in 00 relation subtypes .
EDU 0:
EDU 1: chinese is an ancient hieroglyphic .
EDU 2: it is inattentive to structure .
EDU 3: therefore , segmenting and parsing chinese are more difficult and less accurate .
EDU 4: in this paper , we propose an omni-word feature and a soft constraint method for chinese relation extraction .
EDU 5: the omni-word feature uses every potential word in a sentence as lexicon feature ,
EDU 6: reducing errors
EDU 7: caused by word segmentation .
EDU 8: in order to utilize the structure information of a relation instance ,
EDU 9: we discuss
EDU 10: how soft constraint can be used
EDU 11: to capture the local dependency .
EDU 12: both omni-word feature and soft constraint make a better use of sentence information
EDU 13: and minimize the influences
EDU 14: caused by chinese word segmentation and parsing .
EDU 15: we test these methods on the ace 0000 rdc chinese corpus .
EDU 16: the results show a significant improvement in chinese relation extraction ,
EDU 17: outperforming other methods in f-score by 00 % in 0 relation types and 00 % in 00 relation subtypes .
EDU 0:
EDU 1: active learning ( al ) has been proven effective
EDU 2: to reduce human annotation efforts in nlp .
EDU 3: however , previous studies on al are limited to applications in a single language .
EDU 4: this paper proposes a bilingual active learning paradigm for relation classification ,
EDU 5: where the unlabeled instances are first jointly chosen in terms of their prediction uncertainty scores in two languages
EDU 6: and then manually labeled by an oracle .
EDU 7: instead of using a parallel corpus ,
EDU 8: labeled and unlabeled instances in one language are translated into ones in the other language
EDU 9: and all instances in both languages are then fed into a bilingual active learning engine as pseudo parallel corpora .
EDU 10: experimental results on the ace rdc 0000 chinese and english corpora show
EDU 11: that bilingual active learning for relation classification significantly outperforms monolingual active learning .
EDU 0:
EDU 1: active learning ( al ) has been proven effective
EDU 2: to reduce human annotation efforts in nlp .
EDU 3: however , previous studies on al are limited to applications in a single language .
EDU 4: this paper proposes a bilingual active learning paradigm for relation classification ,
EDU 5: where the unlabeled instances are first jointly chosen in terms of their prediction uncertainty scores in two languages
EDU 6: and then manually labeled by an oracle .
EDU 7: instead of using a parallel corpus ,
EDU 8: labeled and unlabeled instances in one language are translated into ones in the other language
EDU 9: and all instances in both languages are then fed into a bilingual active learning engine as pseudo parallel corpora .
EDU 10: experimental results on the ace rdc 0000 chinese and english corpora show
EDU 11: that bilingual active learning for relation classification significantly outperforms monolingual active learning .
EDU 0:
EDU 1: accurately segmenting a citation string into fields for authors , titles , etc. is a challenging task
EDU 2: because the output typically obeys various global constraints .
EDU 3: previous work has shown
EDU 4: that modeling soft constraints ,
EDU 5: where the model is encouraged ,
EDU 6: but not require to obey the constraints ,
EDU 7: can substantially improve segmentation performance .
EDU 8: on the other hand , for imposing hard constraints ,
EDU 9: dual decomposition is a popular technique for efficient prediction
EDU 10: given existing algorithms for unconstrained inference .
EDU 11: we extend dual decomposition
EDU 12: to perform prediction subject to soft constraints .
EDU 13: moreover , with a technique for performing inference
EDU 14: given soft constraints ,
EDU 15: it is easy to automatically generate large families of constraints
EDU 16: and learn their costs with a simple convex optimization problem
EDU 17: during training .
EDU 18: this allows us to obtain substantial gains in accuracy on a new , challenging citation extraction dataset .
EDU 0:
EDU 1: accurately segmenting a citation string into fields for authors , titles , etc. is a challenging task
EDU 2: because the output typically obeys various global constraints .
EDU 3: previous work has shown
EDU 4: that modeling soft constraints ,
EDU 5: where the model is encouraged ,
EDU 6: but not require to obey the constraints ,
EDU 7: can substantially improve segmentation performance .
EDU 8: on the other hand , for imposing hard constraints ,
EDU 9: dual decomposition is a popular technique for efficient prediction
EDU 10: given existing algorithms for unconstrained inference .
EDU 11: we extend dual decomposition
EDU 12: to perform prediction subject to soft constraints .
EDU 13: moreover , with a technique for performing inference
EDU 14: given soft constraints ,
EDU 15: it is easy to automatically generate large families of constraints
EDU 16: and learn their costs with a simple convex optimization problem
EDU 17: during training .
EDU 18: this allows us to obtain substantial gains in accuracy on a new , challenging citation extraction dataset .
EDU 0:
EDU 1: an important search task in the biomedical domain is to find medical records of patients
EDU 2: who are qualified for a clinical trial.
EDU 3: one commonly used approach is to apply nlp tools
EDU 4: to map terms from queries and documents to concepts
EDU 5: and then compute the relevance scores
EDU 6: based on the concept-based representation .
EDU 7: however , the mapping results are not perfect ,
EDU 8: and none of previous work studied how to deal with them in the retrieval process .
EDU 9: in this paper , we focus on addressing the limitations
EDU 10: caused by the imperfect mapping results
EDU 11: and study how to further improve the retrieval performance of the concept-based ranking methods .
EDU 12: in particular , we apply axiomatic approaches
EDU 13: and propose two weighting regularization methods
EDU 14: that adjust the weighting
EDU 15: based on the relations among the concepts .
EDU 16: experimental results show
EDU 17: that the proposed methods are effective
EDU 18: to improve the retrieval performance ,
EDU 19: and their performances are comparable to other top-performing systems in the trec medical records track .
EDU 0:
EDU 1: an important search task in the biomedical domain is to find medical records of patients
EDU 2: who are qualified for a clinical trial.
EDU 3: one commonly used approach is to apply nlp tools
EDU 4: to map terms from queries and documents to concepts
EDU 5: and then compute the relevance scores
EDU 6: based on the concept-based representation .
EDU 7: however , the mapping results are not perfect ,
EDU 8: and none of previous work studied how to deal with them in the retrieval process .
EDU 9: in this paper , we focus on addressing the limitations
EDU 10: caused by the imperfect mapping results
EDU 11: and study how to further improve the retrieval performance of the concept-based ranking methods .
EDU 12: in particular , we apply axiomatic approaches
EDU 13: and propose two weighting regularization methods
EDU 14: that adjust the weighting
EDU 15: based on the relations among the concepts .
EDU 16: experimental results show
EDU 17: that the proposed methods are effective
EDU 18: to improve the retrieval performance ,
EDU 19: and their performances are comparable to other top-performing systems in the trec medical records track .
EDU 0:
EDU 1: although the distributional hypothesis has been applied successfully in many natural language processing tasks ,
EDU 2: systems
EDU 3: using distributional information
EDU 4: have been limited to a single domain
EDU 5: because the distribution of a word can vary between domains
EDU 6: as the word's predominant meaning changes .
EDU 7: however , if it were possible to predict how the distribution of a word changes from one domain to another ,
EDU 8: the predictions could be used
EDU 9: to adapt a system
EDU 10: trained in one domain
EDU 11: to work in another .
EDU 12: we propose an unsupervised method
EDU 13: to predict the distribution of a word in one domain ,
EDU 14: given its distribution in another domain .
EDU 15: we evaluate our method on two tasks :
EDU 16: cross-domain part-of-speech tagging and cross-domain sentiment classification .
EDU 17: in both tasks , our method significantly outperforms competitive baselines
EDU 18: and returns results
EDU 19: that are statistically comparable to current state-of-the-art methods ,
EDU 20: while requiring no task-specific customisations .
EDU 0:
EDU 1: although the distributional hypothesis has been applied successfully in many natural language processing tasks ,
EDU 2: systems
EDU 3: using distributional information
EDU 4: have been limited to a single domain
EDU 5: because the distribution of a word can vary between domains
EDU 6: as the word's predominant meaning changes .
EDU 7: however , if it were possible to predict how the distribution of a word changes from one domain to another ,
EDU 8: the predictions could be used
EDU 9: to adapt a system
EDU 10: trained in one domain
EDU 11: to work in another .
EDU 12: we propose an unsupervised method
EDU 13: to predict the distribution of a word in one domain ,
EDU 14: given its distribution in another domain .
EDU 15: we evaluate our method on two tasks :
EDU 16: cross-domain part-of-speech tagging and cross-domain sentiment classification .
EDU 17: in both tasks , our method significantly outperforms competitive baselines
EDU 18: and returns results
EDU 19: that are statistically comparable to current state-of-the-art methods ,
EDU 20: while requiring no task-specific customisations .
EDU 0:
EDU 1: we introduce the problem of generation in distributional semantics :
EDU 2: given a distributional vector
EDU 3: representing some meaning ,
EDU 4: how can we generate the phrase
EDU 5: that best expresses that meaning ?
EDU 6: we motivate this novel challenge on theoretical and practical grounds
EDU 7: and propose a simple data-driven approach to the estimation of generation functions .
EDU 8: we test this in a monolingual scenario
EDU 9: ( paraphrase generation )
EDU 10: as well as in a cross-lingual setting
EDU 11: ( translation
EDU 12: by synthesizing adjective-noun phrase vectors in english
EDU 13: and generating the equivalent expressions in italian ) .
EDU 0:
EDU 1: we introduce the problem of generation in distributional semantics :
EDU 2: given a distributional vector
EDU 3: representing some meaning ,
EDU 4: how can we generate the phrase
EDU 5: that best expresses that meaning ?
EDU 6: we motivate this novel challenge on theoretical and practical grounds
EDU 7: and propose a simple data-driven approach to the estimation of generation functions .
EDU 8: we test this in a monolingual scenario
EDU 9: ( paraphrase generation )
EDU 10: as well as in a cross-lingual setting
EDU 11: ( translation
EDU 12: by synthesizing adjective-noun phrase vectors in english
EDU 13: and generating the equivalent expressions in italian ) .
EDU 0:
EDU 1: traditional models of distributional semantics suffer from computational issues such as data sparsity for individual lexemes and complexities
EDU 2: of modeling semantic composition
EDU 3: when dealing with structures larger than single lexical items .
EDU 4: in this work , we present a frequency-driven paradigm for robust distributional semantics in terms of semantically cohesive lineal constituents , or motifs .
EDU 5: the framework subsumes issues such as differential compositional as well as non-compositional behavior of phrasal consituents ,
EDU 6: and circumvents some problems of data sparsity by design .
EDU 7: we design a segmentation model
EDU 8: to optimally partition a sentence into lineal constituents ,
EDU 9: which can be used to define distributional contexts
EDU 10: that are less noisy , semantically more interpretable ,and linguistically disambiguated .
EDU 11: hellinger pca embeddings
EDU 12: learnt using the framework
EDU 13: show competitive results on empirical tasks .
EDU 0:
EDU 1: traditional models of distributional semantics suffer from computational issues such as data sparsity for individual lexemes and complexities
EDU 2: of modeling semantic composition
EDU 3: when dealing with structures larger than single lexical items .
EDU 4: in this work , we present a frequency-driven paradigm for robust distributional semantics in terms of semantically cohesive lineal constituents , or motifs .
EDU 5: the framework subsumes issues such as differential compositional as well as non-compositional behavior of phrasal consituents ,
EDU 6: and circumvents some problems of data sparsity by design .
EDU 7: we design a segmentation model
EDU 8: to optimally partition a sentence into lineal constituents ,
EDU 9: which can be used to define distributional contexts
EDU 10: that are less noisy , semantically more interpretable ,and linguistically disambiguated .
EDU 11: hellinger pca embeddings
EDU 12: learnt using the framework
EDU 13: show competitive results on empirical tasks .
EDU 0:
EDU 1: representing predicates in terms of their argument distribution is common practice in nlp .
EDU 2: multi-word predicates ( mwps ) in this context are often either disregarded
EDU 3: or considered as fixed expressions .
EDU 4: the latter treatment is unsatisfactory in two ways :
EDU 5: ( 0 ) identifying mwps is notoriously difficult ,
EDU 6: ( 0 ) mwps show varying degrees of compositionality
EDU 7: and could benefit from taking into account the identity of their component parts .
EDU 8: we propose a novel approach
EDU 9: that integrates the distributional representation of multiple sub-sets of the mwp's words .
EDU 10: we assume a latent distribution over sub-sets of the mwp ,
EDU 11: and estimate it relative to a downstream prediction task .
EDU 12: focusing on the supervised identification of lexical inference relations ,
EDU 13: we compare against state-of-the-art baselines
EDU 14: that consider a single sub-set of an mwp ,
EDU 15: obtaining substantial improvements .
EDU 16: to our knowledge , this is the first work
EDU 17: to address lexical relations between mwps of varying degrees of compositionality within distributional semantics .
EDU 0:
EDU 1: representing predicates in terms of their argument distribution is common practice in nlp .
EDU 2: multi-word predicates ( mwps ) in this context are often either disregarded
EDU 3: or considered as fixed expressions .
EDU 4: the latter treatment is unsatisfactory in two ways :
EDU 5: ( 0 ) identifying mwps is notoriously difficult ,
EDU 6: ( 0 ) mwps show varying degrees of compositionality
EDU 7: and could benefit from taking into account the identity of their component parts .
EDU 8: we propose a novel approach
EDU 9: that integrates the distributional representation of multiple sub-sets of the mwp's words .
EDU 10: we assume a latent distribution over sub-sets of the mwp ,
EDU 11: and estimate it relative to a downstream prediction task .
EDU 12: focusing on the supervised identification of lexical inference relations ,
EDU 13: we compare against state-of-the-art baselines
EDU 14: that consider a single sub-set of an mwp ,
EDU 15: obtaining substantial improvements .
EDU 16: to our knowledge , this is the first work
EDU 17: to address lexical relations between mwps of varying degrees of compositionality within distributional semantics .
EDU 0:
EDU 1: the ability
EDU 2: to accurately represent sentences
EDU 3: is central to language understanding .
EDU 4: we describe a convolutional architecture
EDU 5: dubbed the dynamic convolutional neural network ( dcnn )
EDU 6: that we adopt for the semantic modelling of sentences .
EDU 7: the network uses dynamic k-max pooling , a global pooling operation over linear sequences .
EDU 8: the network handles input sentences of varying length
EDU 9: and induces a feature graph over the sentence
EDU 10: that is capable of explicitly capturing short and long-range relations .
EDU 11: the network does not rely on a parse tree
EDU 12: and is easily applicable to any language .
EDU 13: we test the dcnn in four experiments :
EDU 14: small scale binary and multi-class sentiment prediction , six-way question classification and twitter sentiment prediction by distant supervision .
EDU 15: the network achieves excellent performance in the first three tasks
EDU 16: and a greater than 00 % error reduction in the last task with respect to the strongest baseline .
EDU 0:
EDU 1: the ability
EDU 2: to accurately represent sentences
EDU 3: is central to language understanding .
EDU 4: we describe a convolutional architecture
EDU 5: dubbed the dynamic convolutional neural network ( dcnn )
EDU 6: that we adopt for the semantic modelling of sentences .
EDU 7: the network uses dynamic k-max pooling , a global pooling operation over linear sequences .
EDU 8: the network handles input sentences of varying length
EDU 9: and induces a feature graph over the sentence
EDU 10: that is capable of explicitly capturing short and long-range relations .
EDU 11: the network does not rely on a parse tree
EDU 12: and is easily applicable to any language .
EDU 13: we test the dcnn in four experiments :
EDU 14: small scale binary and multi-class sentiment prediction , six-way question classification and twitter sentiment prediction by distant supervision .
EDU 15: the network achieves excellent performance in the first three tasks
EDU 16: and a greater than 00 % error reduction in the last task with respect to the strongest baseline .
EDU 0:
EDU 1: we propose an online learning algorithm
EDU 2: based on tensor-space models .
EDU 3: a tensor-space model represents data in a compact way ,
EDU 4: and via rank-0 approximation
EDU 5: the weight tensor can be made highly structured ,
EDU 6: resulting in a significantly smaller number of free parameters
EDU 7: to be estimated
EDU 8: than in comparable vector-space models .
EDU 9: this regularizes the model complexity
EDU 10: and makes the tensor model highly effective in situations
EDU 11: where a large feature set is defined
EDU 12: but very limited resources are available for training .
EDU 13: we apply with the proposed algorithm to a parsing task ,
EDU 14: and show
EDU 15: that even with very little training data the learning algorithm
EDU 16: based on a tensor model
EDU 17: performs well ,
EDU 18: and gives significantly better results than standard learning algorithms
EDU 19: based on traditional vector-space models .
EDU 0:
EDU 1: we propose an online learning algorithm
EDU 2: based on tensor-space models .
EDU 3: a tensor-space model represents data in a compact way ,
EDU 4: and via rank-0 approximation
EDU 5: the weight tensor can be made highly structured ,
EDU 6: resulting in a significantly smaller number of free parameters
EDU 7: to be estimated
EDU 8: than in comparable vector-space models .
EDU 9: this regularizes the model complexity
EDU 10: and makes the tensor model highly effective in situations
EDU 11: where a large feature set is defined
EDU 12: but very limited resources are available for training .
EDU 13: we apply with the proposed algorithm to a parsing task ,
EDU 14: and show
EDU 15: that even with very little training data the learning algorithm
EDU 16: based on a tensor model
EDU 17: performs well ,
EDU 18: and gives significantly better results than standard learning algorithms
EDU 19: based on traditional vector-space models .
EDU 0:
EDU 1: statistical phrase-based translation learns translation rules from bilingual corpora ,
EDU 2: and has traditionally only used monolingual evidence
EDU 3: to construct features
EDU 4: that rescore existing translation candidates .
EDU 5: in this work , we present a semi-supervised graph-based approach
EDU 6: for generating new translation rules
EDU 7: that leverages bilingual and monolingual data .
EDU 8: the proposed technique first constructs phrase graphs
EDU 9: using both source and target language monolingual corpora .
EDU 10: next , graph propagation identifies translations of phrases
EDU 11: that were not observed in the bilingual corpus ,
EDU 12: assuming
EDU 13: that similar phrases have similar translations .
EDU 14: we report results on a large arabic-english system and a medium-sized urdu-english system .
EDU 15: our proposed approach significantly improves the performance of competitive phrase-based systems ,
EDU 16: leading to consistent improvements between 0 and 0 bleu points on standard evaluation sets .
EDU 0:
EDU 1: statistical phrase-based translation learns translation rules from bilingual corpora ,
EDU 2: and has traditionally only used monolingual evidence
EDU 3: to construct features
EDU 4: that rescore existing translation candidates .
EDU 5: in this work , we present a semi-supervised graph-based approach
EDU 6: for generating new translation rules
EDU 7: that leverages bilingual and monolingual data .
EDU 8: the proposed technique first constructs phrase graphs
EDU 9: using both source and target language monolingual corpora .
EDU 10: next , graph propagation identifies translations of phrases
EDU 11: that were not observed in the bilingual corpus ,
EDU 12: assuming
EDU 13: that similar phrases have similar translations .
EDU 14: we report results on a large arabic-english system and a medium-sized urdu-english system .
EDU 15: our proposed approach significantly improves the performance of competitive phrase-based systems ,
EDU 16: leading to consistent improvements between 0 and 0 bleu points on standard evaluation sets .
EDU 0:
EDU 1: we present experiments in using discourse structure
EDU 2: for improving machine translation evaluation .
EDU 3: we first design two discourse-aware similarity measures ,
EDU 4: which use all-subtree kernels
EDU 5: to compare discourse parse trees in accordance with the rhetorical structure theory .
EDU 6: then , we show
EDU 7: that these measures can help improve a number of existing machine translation evaluation metrics both at the segment- and at the system-level .
EDU 8: rather than proposing a single new metric ,
EDU 9: we show
EDU 10: that discourse information is complementary to the state-of-the-art evaluation metrics ,
EDU 11: and thus should be taken into account in the development of future richer evaluation metrics .
EDU 0:
EDU 1: we present experiments in using discourse structure
EDU 2: for improving machine translation evaluation .
EDU 3: we first design two discourse-aware similarity measures ,
EDU 4: which use all-subtree kernels
EDU 5: to compare discourse parse trees in accordance with the rhetorical structure theory .
EDU 6: then , we show
EDU 7: that these measures can help improve a number of existing machine translation evaluation metrics both at the segment- and at the system-level .
EDU 8: rather than proposing a single new metric ,
EDU 9: we show
EDU 10: that discourse information is complementary to the state-of-the-art evaluation metrics ,
EDU 11: and thus should be taken into account in the development of future richer evaluation metrics .
EDU 0:
EDU 1: this paper tackles the sparsity problem
EDU 2: in estimating phrase translation probabilities
EDU 3: by learning continuous phrase representations ,
EDU 4: whose distributed nature enables the sharing of related phrases in their representations .
EDU 5: a pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent space ,
EDU 6: where their translation score is computed by the distance between the pair in this new space .
EDU 7: the projection is performed by a neural network
EDU 8: whose weights are learned on parallel training data .
EDU 9: experimental evaluation has been performed on two wmt translation tasks .
EDU 10: our best result improves the performance of a state-of-the-art phrase-based statistical machine translation system
EDU 11: trained on wmt 0000 french-english data by up to 0.0 bleu points .
EDU 0:
EDU 1: this paper tackles the sparsity problem
EDU 2: in estimating phrase translation probabilities
EDU 3: by learning continuous phrase representations ,
EDU 4: whose distributed nature enables the sharing of related phrases in their representations .
EDU 5: a pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent space ,
EDU 6: where their translation score is computed by the distance between the pair in this new space .
EDU 7: the projection is performed by a neural network
EDU 8: whose weights are learned on parallel training data .
EDU 9: experimental evaluation has been performed on two wmt translation tasks .
EDU 10: our best result improves the performance of a state-of-the-art phrase-based statistical machine translation system
EDU 11: trained on wmt 0000 french-english data by up to 0.0 bleu points .
EDU 0:
EDU 1: the automatic estimation of machine translation ( mt ) output quality is a hard task
EDU 2: in which the selection of the appropriate algorithm and the most predictive features over reasonably sized training sets plays a crucial role .
EDU 3: when moving from controlled lab evaluations to real-life scenarios
EDU 4: the task becomes even harder .
EDU 5: for current mt quality estimation ( qe ) systems , additional complexity comes from the difficulty
EDU 6: to model user and domain changes .
EDU 7: indeed , the instability of the systems with respect to data
EDU 8: coming from different distributions
EDU 9: calls for adaptive solutions
EDU 10: that react to new operating conditions .
EDU 11: to tackle this issue
EDU 12: we propose an online framework for adaptive qe
EDU 13: that targets reactivity and robustness to user and domain changes .
EDU 14: contrastive experiments in different testing conditions
EDU 15: involving user and domain changes
EDU 16: demonstrate the effectiveness of our approach .
EDU 0:
EDU 1: the automatic estimation of machine translation ( mt ) output quality is a hard task
EDU 2: in which the selection of the appropriate algorithm and the most predictive features over reasonably sized training sets plays a crucial role .
EDU 3: when moving from controlled lab evaluations to real-life scenarios
EDU 4: the task becomes even harder .
EDU 5: for current mt quality estimation ( qe ) systems , additional complexity comes from the difficulty
EDU 6: to model user and domain changes .
EDU 7: indeed , the instability of the systems with respect to data
EDU 8: coming from different distributions
EDU 9: calls for adaptive solutions
EDU 10: that react to new operating conditions .
EDU 11: to tackle this issue
EDU 12: we propose an online framework for adaptive qe
EDU 13: that targets reactivity and robustness to user and domain changes .
EDU 14: contrastive experiments in different testing conditions
EDU 15: involving user and domain changes
EDU 16: demonstrate the effectiveness of our approach .
EDU 0:
EDU 1: in this paper we address the problem of grounding distributional representations of lexical meaning .
EDU 2: we introduce a new model
EDU 3: which uses stacked autoencoders
EDU 4: to learn higher-level embeddings from textual and visual input .
EDU 5: the two modalities are encoded as vectors of attributes
EDU 6: and are obtained automatically from text and images , respectively .
EDU 7: we evaluate our model on its ability
EDU 8: to simulate similarity judgments and concept categorization .
EDU 9: on both tasks , our approach outperforms baselines and related models .
EDU 0:
EDU 1: in this paper we address the problem of grounding distributional representations of lexical meaning .
EDU 2: we introduce a new model
EDU 3: which uses stacked autoencoders
EDU 4: to learn higher-level embeddings from textual and visual input .
EDU 5: the two modalities are encoded as vectors of attributes
EDU 6: and are obtained automatically from text and images , respectively .
EDU 7: we evaluate our model on its ability
EDU 8: to simulate similarity judgments and concept categorization .
EDU 9: on both tasks , our approach outperforms baselines and related models .
EDU 0:
EDU 1: we propose three improvements
EDU 2: to address the drawbacks of state-of-the-art transition-based constituent parsers .
EDU 3: first , to resolve the error propagation problem of the traditional pipeline approach ,
EDU 4: we incorporate pos tagging into the syntactic parsing process .
EDU 5: second , to alleviate the negative influence of size differences among competing action sequences ,
EDU 6: we align parser states during beam-search decoding .
EDU 7: third , to enhance the power of parsing models ,
EDU 8: we enlarge the feature set with non-local features and semi-supervised word cluster features .
EDU 9: experimental results show
EDU 10: that these modifications improve parsing performance significantly .
EDU 11: evaluated on the chinese treebank ( ctb ) ,
EDU 12: our final performance reaches 00.0 % ( f0 )
EDU 13: when trained on ctb 0.0 ,
EDU 14: and 00.0 %
EDU 15: when trained on ctb 0.0 ,
EDU 16: and these results outperform all state-of-the-art parsers .
EDU 0:
EDU 1: we propose three improvements
EDU 2: to address the drawbacks of state-of-the-art transition-based constituent parsers .
EDU 3: first , to resolve the error propagation problem of the traditional pipeline approach ,
EDU 4: we incorporate pos tagging into the syntactic parsing process .
EDU 5: second , to alleviate the negative influence of size differences among competing action sequences ,
EDU 6: we align parser states during beam-search decoding .
EDU 7: third , to enhance the power of parsing models ,
EDU 8: we enlarge the feature set with non-local features and semi-supervised word cluster features .
EDU 9: experimental results show
EDU 10: that these modifications improve parsing performance significantly .
EDU 11: evaluated on the chinese treebank ( ctb ) ,
EDU 12: our final performance reaches 00.0 % ( f0 )
EDU 13: when trained on ctb 0.0 ,
EDU 14: and 00.0 %
EDU 15: when trained on ctb 0.0 ,
EDU 16: and these results outperform all state-of-the-art parsers .
EDU 0:
EDU 1: in this paper , we investigate various strategies
EDU 2: to predict both syntactic dependency parsing and contiguous multiword expression ( mwe ) recognition ,
EDU 3: testing them on the dependency version of french tree-bank (abeillé and barrier, 0000) ,
EDU 4: as instantiated in the spmrl shared task ( seddah et al. , 0000 ) .
EDU 5: our work focuses on using an alternative representation of syntactically regular mwes ,
EDU 6: which captures their syntactic internal structure .
EDU 7: we obtain a system with comparable performance to that of previous works on this dataset ,
EDU 8: but which predicts both syntactic dependencies and the internal structure of mwes .
EDU 9: this can be useful for capturing the various degrees of semantic compositionality of mwes .
EDU 0:
EDU 1: in this paper , we investigate various strategies
EDU 2: to predict both syntactic dependency parsing and contiguous multiword expression ( mwe ) recognition ,
EDU 3: testing them on the dependency version of french tree-bank (abeillé and barrier, 0000) ,
EDU 4: as instantiated in the spmrl shared task ( seddah et al. , 0000 ) .
EDU 5: our work focuses on using an alternative representation of syntactically regular mwes ,
EDU 6: which captures their syntactic internal structure .
EDU 7: we obtain a system with comparable performance to that of previous works on this dataset ,
EDU 8: but which predicts both syntactic dependencies and the internal structure of mwes .
EDU 9: this can be useful for capturing the various degrees of semantic compositionality of mwes .
EDU 0:
EDU 1: this paper presents a novel framework
EDU 2: called error case frames
EDU 3: for correcting preposition errors .
EDU 4: they are case frames
EDU 5: specially designed for describing and correcting preposition errors .
EDU 6: their most distinct advantage is that they can correct errors with feedback messages
EDU 7: explaining why the preposition is erroneous .
EDU 8: this paper proposes a method
EDU 9: for automatically generating them
EDU 10: by comparing learner and native corpora .
EDU 11: experiments show
EDU 12: ( i ) automatically generated error case frames achieve a performance comparable to conventional methods ;
EDU 13: ( ii ) error case frames are intuitively interpretable and manually modifiable to improve them ;
EDU 14: ( iii ) feedback messages provided by error case frames are effective in language learning assistance .
EDU 15: considering these advantages and the fact
EDU 16: that it has been difficult to provide feedback messages by automatically generated rules ,
EDU 17: error case frames will likely be one of the major approaches for preposition error correction .
EDU 0:
EDU 1: this paper presents a novel framework
EDU 2: called error case frames
EDU 3: for correcting preposition errors .
EDU 4: they are case frames
EDU 5: specially designed for describing and correcting preposition errors .
EDU 6: their most distinct advantage is that they can correct errors with feedback messages
EDU 7: explaining why the preposition is erroneous .
EDU 8: this paper proposes a method
EDU 9: for automatically generating them
EDU 10: by comparing learner and native corpora .
EDU 11: experiments show
EDU 12: ( i ) automatically generated error case frames achieve a performance comparable to conventional methods ;
EDU 13: ( ii ) error case frames are intuitively interpretable and manually modifiable to improve them ;
EDU 14: ( iii ) feedback messages provided by error case frames are effective in language learning assistance .
EDU 15: considering these advantages and the fact
EDU 16: that it has been difficult to provide feedback messages by automatically generated rules ,
EDU 17: error case frames will likely be one of the major approaches for preposition error correction .
EDU 0:
EDU 1: widely used in speech and language processing ,
EDU 2: kneser-ney ( kn ) smoothing has consistently been shown to be one of the best-performing smoothing methods .
EDU 3: however , kn smoothing assumes integer counts ,
EDU 4: limiting its potential uses—for example , inside expectation-maximization .
EDU 5: in this paper , we propose a generalization of kn smoothing
EDU 6: that operates on fractional counts , or , more precisely , on distributions over counts .
EDU 7: we rederive all the steps of kn smoothing to operate on count distributions instead of integral counts ,
EDU 8: and apply it to two tasks
EDU 9: where kn smoothing was not applicable before :
EDU 10: one in language model adaptation , and the other in word alignment .
EDU 11: in both cases , our method improves performance significantly .
EDU 0:
EDU 1: widely used in speech and language processing ,
EDU 2: kneser-ney ( kn ) smoothing has consistently been shown to be one of the best-performing smoothing methods .
EDU 3: however , kn smoothing assumes integer counts ,
EDU 4: limiting its potential uses—for example , inside expectation-maximization .
EDU 5: in this paper , we propose a generalization of kn smoothing
EDU 6: that operates on fractional counts , or , more precisely , on distributions over counts .
EDU 7: we rederive all the steps of kn smoothing to operate on count distributions instead of integral counts ,
EDU 8: and apply it to two tasks
EDU 9: where kn smoothing was not applicable before :
EDU 10: one in language model adaptation , and the other in word alignment .
EDU 11: in both cases , our method improves performance significantly .
EDU 0:
EDU 1: entity clustering must determine when two named-entity mentions refer to the same entity .
EDU 2: typical approaches use a pipeline architecture
EDU 3: that clusters the mentions
EDU 4: using fixed or learned measures of name and context similarity .
EDU 5: in this paper , we propose a model for cross-document coreference resolution
EDU 6: that achieves robustness
EDU 7: by learning similarity from unlabeled data .
EDU 8: the generative process assumes
EDU 9: that each entity mention arises from copying and optionally mutating an earlier name from a similar context .
EDU 10: clustering the mentions into entities depends on recovering this copying tree jointly
EDU 11: with estimating models of the mutation process and parent selection process .
EDU 12: we present a block gibbs sampler for posterior inference and an empirical evaluation on several datasets .
EDU 0:
EDU 1: entity clustering must determine when two named-entity mentions refer to the same entity .
EDU 2: typical approaches use a pipeline architecture
EDU 3: that clusters the mentions
EDU 4: using fixed or learned measures of name and context similarity .
EDU 5: in this paper , we propose a model for cross-document coreference resolution
EDU 6: that achieves robustness
EDU 7: by learning similarity from unlabeled data .
EDU 8: the generative process assumes
EDU 9: that each entity mention arises from copying and optionally mutating an earlier name from a similar context .
EDU 10: clustering the mentions into entities depends on recovering this copying tree jointly
EDU 11: with estimating models of the mutation process and parent selection process .
EDU 12: we present a block gibbs sampler for posterior inference and an empirical evaluation on several datasets .
EDU 0:
EDU 1: we introduce three linguistically motivated structured regularizers
EDU 2: based on parse trees , topics , and hierarchical word clusters for text categorization .
EDU 3: these regularizers impose linguistic bias in feature weights ,
EDU 4: enabling us to incorporate prior knowledge into conventional bag-of-words models .
EDU 5: we show
EDU 6: that our structured regularizers consistently improve classification accuracies
EDU 7: compared to standard regularizers
EDU 8: that penalize features in isolation
EDU 9: ( such as lasso , ridge , and elastic net regularizers )
EDU 10: on a range of datasets for various text prediction problems :
EDU 11: topic classification , sentiment analysis , and forecasting .
EDU 0:
EDU 1: we introduce three linguistically motivated structured regularizers
EDU 2: based on parse trees , topics , and hierarchical word clusters for text categorization .
EDU 3: these regularizers impose linguistic bias in feature weights ,
EDU 4: enabling us to incorporate prior knowledge into conventional bag-of-words models .
EDU 5: we show
EDU 6: that our structured regularizers consistently improve classification accuracies
EDU 7: compared to standard regularizers
EDU 8: that penalize features in isolation
EDU 9: ( such as lasso , ridge , and elastic net regularizers )
EDU 10: on a range of datasets for various text prediction problems :
EDU 11: topic classification , sentiment analysis , and forecasting .
EDU 0:
EDU 1: this paper studies the idea of removing low-frequency words from a corpus ,
EDU 2: which is a common practice
EDU 3: to reduce computational costs , from a theoretical standpoint .
EDU 4: based on the assumption
EDU 5: that a corpus follows zipf's law ,
EDU 6: we derive trade-off formulae of the perplexity of k-gram models and topic models with respect to the size of the reduced vocabulary .
EDU 7: in addition , we show an approximate behavior of each formula under certain conditions .
EDU 8: we verify the correctness of our theory on synthetic corpora
EDU 9: and examine the gap between theory and practice on real corpora .
EDU 0:
EDU 1: this paper studies the idea of removing low-frequency words from a corpus ,
EDU 2: which is a common practice
EDU 3: to reduce computational costs , from a theoretical standpoint .
EDU 4: based on the assumption
EDU 5: that a corpus follows zipf's law ,
EDU 6: we derive trade-off formulae of the perplexity of k-gram models and topic models with respect to the size of the reduced vocabulary .
EDU 7: in addition , we show an approximate behavior of each formula under certain conditions .
EDU 8: we verify the correctness of our theory on synthetic corpora
EDU 9: and examine the gap between theory and practice on real corpora .
EDU 0:
EDU 1: we propose a two-phase framework
EDU 2: to adapt existing relation extraction classifiers to extract relations for new target domains .
EDU 3: we address two challenges :
EDU 4: negative transfer
EDU 5: when knowledge in source domains is used
EDU 6: without considering the differences in relation distributions ;
EDU 7: and lack of adequate labeled samples for rarer relations in the new domain ,
EDU 8: due to a small labeled data set and imbalance relation distributions .
EDU 9: our framework leverages on both labeled and unlabeled data in the target domain .
EDU 10: first , we determine the relevance of each source domain to the target domain for each relation type ,
EDU 11: using the consistency between the clustering
EDU 12: given by the target domain labels
EDU 13: and the clustering
EDU 14: given by the predictors
EDU 15: trained for the source domain .
EDU 16: to overcome the lack of labeled samples for rarer relations ,
EDU 17: these clusterings operate on both the labeled and unlabeled data in the target domain .
EDU 18: second , we trade-off between using relevance-weighted source-domain predictors and the labeled target data .
EDU 19: again , to overcome the imbalance distribution ,
EDU 20: the source-domain predictors operate on the unlabeled target data .
EDU 21: our method outperforms numerous baselines and a weakly-supervised relation extraction method on ace 0000 and yago .
EDU 0:
EDU 1: we propose a two-phase framework
EDU 2: to adapt existing relation extraction classifiers to extract relations for new target domains .
EDU 3: we address two challenges :
EDU 4: negative transfer
EDU 5: when knowledge in source domains is used
EDU 6: without considering the differences in relation distributions ;
EDU 7: and lack of adequate labeled samples for rarer relations in the new domain ,
EDU 8: due to a small labeled data set and imbalance relation distributions .
EDU 9: our framework leverages on both labeled and unlabeled data in the target domain .
EDU 10: first , we determine the relevance of each source domain to the target domain for each relation type ,
EDU 11: using the consistency between the clustering
EDU 12: given by the target domain labels
EDU 13: and the clustering
EDU 14: given by the predictors
EDU 15: trained for the source domain .
EDU 16: to overcome the lack of labeled samples for rarer relations ,
EDU 17: these clusterings operate on both the labeled and unlabeled data in the target domain .
EDU 18: second , we trade-off between using relevance-weighted source-domain predictors and the labeled target data .
EDU 19: again , to overcome the imbalance distribution ,
EDU 20: the source-domain predictors operate on the unlabeled target data .
EDU 21: our method outperforms numerous baselines and a weakly-supervised relation extraction method on ace 0000 and yago .
EDU 0:
EDU 1: most existing relation extraction models make predictions for each entity pair locally and individually ,
EDU 2: while ignoring implicit global clues available in the knowledge base ,
EDU 3: sometimes leading to conflicts among local predictions from different entity pairs .
EDU 4: in this paper , we propose a joint inference framework
EDU 5: that utilizes these global clues to resolve disagreements among local predictions .
EDU 6: we exploit two kinds of clues
EDU 7: to generate constraints
EDU 8: which can capture the implicit type and cardinality requirements of a relation .
EDU 9: experimental results on three datasets , in both english and chinese , show
EDU 10: that our framework outperforms the state-of-the-art relation extraction models
EDU 11: when such clues are applicable to the datasets .
EDU 12: and , we find
EDU 13: that the clues
EDU 14: learnt automatically from existing knowledge bases
EDU 15: perform comparably to those
EDU 16: refined by human .
EDU 0:
EDU 1: most existing relation extraction models make predictions for each entity pair locally and individually ,
EDU 2: while ignoring implicit global clues available in the knowledge base ,
EDU 3: sometimes leading to conflicts among local predictions from different entity pairs .
EDU 4: in this paper , we propose a joint inference framework
EDU 5: that utilizes these global clues to resolve disagreements among local predictions .
EDU 6: we exploit two kinds of clues
EDU 7: to generate constraints
EDU 8: which can capture the implicit type and cardinality requirements of a relation .
EDU 9: experimental results on three datasets , in both english and chinese , show
EDU 10: that our framework outperforms the state-of-the-art relation extraction models
EDU 11: when such clues are applicable to the datasets .
EDU 12: and , we find
EDU 13: that the clues
EDU 14: learnt automatically from existing knowledge bases
EDU 15: perform comparably to those
EDU 16: refined by human .
EDU 0:
EDU 1: in this paper , we present a manifold model for medical relation extraction .
EDU 2: our model is built upon a medical corpus
EDU 3: containing 00m sentences
EDU 4: ( 00 gigabyte text )
EDU 5: and designed to accurately and efciently detect the key medical relations
EDU 6: that can facilitate clinical decision making .
EDU 7: our approach integrates domain specic parsing and typing systems ,
EDU 8: and can utilize labeled as well as unlabeled examples .
EDU 9: to provide users with more exibility ,
EDU 10: we also take label weight into consideration .
EDU 11: effectiveness of our model is demonstrated both theoretically with a proof
EDU 12: to show
EDU 13: that the solution is a closed-form solution and experimentally with positive results in experiments .
EDU 0:
EDU 1: in this paper , we present a manifold model for medical relation extraction .
EDU 2: our model is built upon a medical corpus
EDU 3: containing 00m sentences
EDU 4: ( 00 gigabyte text )
EDU 5: and designed to accurately and efciently detect the key medical relations
EDU 6: that can facilitate clinical decision making .
EDU 7: our approach integrates domain specic parsing and typing systems ,
EDU 8: and can utilize labeled as well as unlabeled examples .
EDU 9: to provide users with more exibility ,
EDU 10: we also take label weight into consideration .
EDU 11: effectiveness of our model is demonstrated both theoretically with a proof
EDU 12: to show
EDU 13: that the solution is a closed-form solution and experimentally with positive results in experiments .
EDU 0:
EDU 1: the essence of distantly supervised relation extraction is that it is an incomplete multi-label classification problem with sparse and noisy features .
EDU 2: to tackle the sparsity and noise challenges ,
EDU 3: we propose solving the classification problem
EDU 4: using matrix completion on factorized matrix of minimized rank .
EDU 5: we formulate relation classification as completing the unknown labels of testing items
EDU 6: ( entity pairs )
EDU 7: in a sparse matrix
EDU 8: that concatenates training and testing textual features with training labels .
EDU 9: our algorithmic framework is based on the assumption
EDU 10: that the rank of item-by-feature and item-by-label joint matrix is low .
EDU 11: we apply two optimization models
EDU 12: to recover the underlying low-rank matrix
EDU 13: leveraging the sparsity of feature-label matrix .
EDU 14: the matrix completion problem is then solved by the fixed point continuation ( fpc ) algorithm ,
EDU 15: which can find the global optimum .
EDU 16: experiments on two widely used datasets with different dimensions of textual features demonstrate
EDU 17: that our low-rank matrix completion approach significantly outperforms the baseline and the state-of-the-art methods .
EDU 0:
EDU 1: the essence of distantly supervised relation extraction is that it is an incomplete multi-label classification problem with sparse and noisy features .
EDU 2: to tackle the sparsity and noise challenges ,
EDU 3: we propose solving the classification problem
EDU 4: using matrix completion on factorized matrix of minimized rank .
EDU 5: we formulate relation classification as completing the unknown labels of testing items
EDU 6: ( entity pairs )
EDU 7: in a sparse matrix
EDU 8: that concatenates training and testing textual features with training labels .
EDU 9: our algorithmic framework is based on the assumption
EDU 10: that the rank of item-by-feature and item-by-label joint matrix is low .
EDU 11: we apply two optimization models
EDU 12: to recover the underlying low-rank matrix
EDU 13: leveraging the sparsity of feature-label matrix .
EDU 14: the matrix completion problem is then solved by the fixed point continuation ( fpc ) algorithm ,
EDU 15: which can find the global optimum .
EDU 16: experiments on two widely used datasets with different dimensions of textual features demonstrate
EDU 17: that our low-rank matrix completion approach significantly outperforms the baseline and the state-of-the-art methods .
EDU 0:
EDU 1: transitional expressions provide glue
EDU 2: that holds ideas together in a text
EDU 3: and enhance the logical organization ,
EDU 4: which together help improve readability of a text .
EDU 5: however , in most current statistical machine translation ( smt ) systems , the outputs of compound-complex sentences still lack proper transitional expressions .
EDU 6: as a result , the translations are often hard to read and understand .
EDU 7: to address this issue ,
EDU 8: we propose two novel models
EDU 9: to encourage generating such transitional expressions
EDU 10: by introducing the source compound-complex sentence structure ( css ) .
EDU 11: our models include a css-based translation model ,
EDU 12: which generates new css-based translation rules , and a generative transfer model ,
EDU 13: which encourages producing transitional expressions
EDU 14: during decoding .
EDU 15: the two models are integrated into a hierarchical phrase-based translation system
EDU 16: to evaluate their effectiveness .
EDU 17: the experimental results show
EDU 18: that significant improvements are achieved on various test data
EDU 19: meanwhile the translations are more cohesive and smooth .
EDU 0:
EDU 1: transitional expressions provide glue
EDU 2: that holds ideas together in a text
EDU 3: and enhance the logical organization ,
EDU 4: which together help improve readability of a text .
EDU 5: however , in most current statistical machine translation ( smt ) systems , the outputs of compound-complex sentences still lack proper transitional expressions .
EDU 6: as a result , the translations are often hard to read and understand .
EDU 7: to address this issue ,
EDU 8: we propose two novel models
EDU 9: to encourage generating such transitional expressions
EDU 10: by introducing the source compound-complex sentence structure ( css ) .
EDU 11: our models include a css-based translation model ,
EDU 12: which generates new css-based translation rules , and a generative transfer model ,
EDU 13: which encourages producing transitional expressions
EDU 14: during decoding .
EDU 15: the two models are integrated into a hierarchical phrase-based translation system
EDU 16: to evaluate their effectiveness .
EDU 17: the experimental results show
EDU 18: that significant improvements are achieved on various test data
EDU 19: meanwhile the translations are more cohesive and smooth .
EDU 0:
EDU 1: we present an adaptive translation quality estimation ( qe ) method
EDU 2: to predict the human-targeted translation error rate ( hter ) for a document-specific machine translation model .
EDU 3: we first introduce features
EDU 4: derived internal to the translation decoding process as well as externally from the source sentence analysis .
EDU 5: we show the effectiveness of such features in both classification and regression of mt quality .
EDU 6: by dynamically training the qe model for the document-specific mt model ,
EDU 7: we are able to achieve consistency and prediction quality across multiple documents ,
EDU 8: demonstrated by the higher correlation coefficient and f-scores in findinggood sentences .
EDU 9: additionally , the proposed method is applied to ibm english-to-japanese mt post editing field study
EDU 10: and we observe strong correlation with human preference , with a 00 % increase in human translators's productivity .
EDU 0:
EDU 1: we present an adaptive translation quality estimation ( qe ) method
EDU 2: to predict the human-targeted translation error rate ( hter ) for a document-specific machine translation model .
EDU 3: we first introduce features
EDU 4: derived internal to the translation decoding process as well as externally from the source sentence analysis .
EDU 5: we show the effectiveness of such features in both classification and regression of mt quality .
EDU 6: by dynamically training the qe model for the document-specific mt model ,
EDU 7: we are able to achieve consistency and prediction quality across multiple documents ,
EDU 8: demonstrated by the higher correlation coefficient and f-scores in findinggood sentences .
EDU 9: additionally , the proposed method is applied to ibm english-to-japanese mt post editing field study
EDU 10: and we observe strong correlation with human preference , with a 00 % increase in human translators's productivity .
EDU 0:
EDU 1: in this paper we present new research in translation assistance .
EDU 2: we describe a system capable of translating native language ( l0 ) fragments to foreign language ( l0 ) fragments in an l0 context .
EDU 3: practical applications of this research can be framed in the context of second language learning .
EDU 4: the type of translation assistance system under investigation here encourages language learners to write in their target language
EDU 5: while allowing them to fall back to their native language
EDU 6: in case the correct word or expression is not known .
EDU 7: these code switches are subsequently translated to l0
EDU 8: given the l0 context .
EDU 9: we study the feasibility of exploiting cross-lingual context
EDU 10: to obtain high-quality translation suggestions
EDU 11: that improve over statistical language modelling and word-sense disambiguation baselines .
EDU 12: a classification-based approach is presented
EDU 13: that is indeed found
EDU 14: to improve significantly over these baselines
EDU 15: by making use of a contextual window
EDU 16: spanning a small number of neighbouring words .
EDU 0:
EDU 1: in this paper we present new research in translation assistance .
EDU 2: we describe a system capable of translating native language ( l0 ) fragments to foreign language ( l0 ) fragments in an l0 context .
EDU 3: practical applications of this research can be framed in the context of second language learning .
EDU 4: the type of translation assistance system under investigation here encourages language learners to write in their target language
EDU 5: while allowing them to fall back to their native language
EDU 6: in case the correct word or expression is not known .
EDU 7: these code switches are subsequently translated to l0
EDU 8: given the l0 context .
EDU 9: we study the feasibility of exploiting cross-lingual context
EDU 10: to obtain high-quality translation suggestions
EDU 11: that improve over statistical language modelling and word-sense disambiguation baselines .
EDU 12: a classification-based approach is presented
EDU 13: that is indeed found
EDU 14: to improve significantly over these baselines
EDU 15: by making use of a contextual window
EDU 16: spanning a small number of neighbouring words .
EDU 0:
EDU 1: we propose a novel learning approach for statistical machine translation ( smt )
EDU 2: that allows to extract supervision signals for structured learning from an extrinsic response to a translation input .
EDU 3: we show
EDU 4: how to generate responses
EDU 5: by grounding smt in the task
EDU 6: of executing a semantic parse of a translated query against a database .
EDU 7: experiments on the geoquery database show an improvement of about 0 points in f0-score for response-based learning over learning from references
EDU 8: only on returning the correct answer from a semantic parse of a translated query .
EDU 9: in general , our approach alleviates the dependency on human reference translations
EDU 10: and solves the reachability problem in structured learning for smt .
EDU 0:
EDU 1: we propose a novel learning approach for statistical machine translation ( smt )
EDU 2: that allows to extract supervision signals for structured learning from an extrinsic response to a translation input .
EDU 3: we show
EDU 4: how to generate responses
EDU 5: by grounding smt in the task
EDU 6: of executing a semantic parse of a translated query against a database .
EDU 7: experiments on the geoquery database show an improvement of about 0 points in f0-score for response-based learning over learning from references
EDU 8: only on returning the correct answer from a semantic parse of a translated query .
EDU 9: in general , our approach alleviates the dependency on human reference translations
EDU 10: and solves the reachability problem in structured learning for smt .
EDU 0:
EDU 1: abstractive text summarization of news requires a way of representing events , such as a collection of pattern clusters
EDU 2: in which every cluster represents an event
EDU 3: ( e.g. , marriage )
EDU 4: and every pattern in the cluster is a way of expressing the event
EDU 5: ( e.g. , x married y , x and y tied the knot ) .
EDU 6: we compare three ways of extracting event patterns :
EDU 7: heuristics-based , compression-based and memory-based .
EDU 8: while the former has been used previously in multi-document abstraction ,
EDU 9: the latter two have never been used for this task .
EDU 10: compared with the first two techniques ,
EDU 11: the memory-based method allows for generating significantly more grammatical and informa-tive sentences ,
EDU 12: at the cost of searching a vast space of hundreds of millions of parse trees of known grammatical utterances .
EDU 13: to this end , we introduce a data structure and a search method
EDU 14: that make it possible to efficiently extrapolate from every sentence the parse sub-trees
EDU 15: that match against any of the stored utterances .
EDU 0:
EDU 1: abstractive text summarization of news requires a way of representing events , such as a collection of pattern clusters
EDU 2: in which every cluster represents an event
EDU 3: ( e.g. , marriage )
EDU 4: and every pattern in the cluster is a way of expressing the event
EDU 5: ( e.g. , x married y , x and y tied the knot ) .
EDU 6: we compare three ways of extracting event patterns :
EDU 7: heuristics-based , compression-based and memory-based .
EDU 8: while the former has been used previously in multi-document abstraction ,
EDU 9: the latter two have never been used for this task .
EDU 10: compared with the first two techniques ,
EDU 11: the memory-based method allows for generating significantly more grammatical and informa-tive sentences ,
EDU 12: at the cost of searching a vast space of hundreds of millions of parse trees of known grammatical utterances .
EDU 13: to this end , we introduce a data structure and a search method
EDU 14: that make it possible to efficiently extrapolate from every sentence the parse sub-trees
EDU 15: that match against any of the stored utterances .
EDU 0:
EDU 1: for topics
EDU 2: that cover large amounts of information ,
EDU 3: simple , short summaries are insufficient
EDU 4: - complex topics require more information and more structure to understand .
EDU 5: we propose a new approach to scaling up summarization
EDU 6: called hierarchical summarization ,
EDU 7: and present the first implemented system , summa .
EDU 8: summa produces a hierarchy of relatively short summaries ,
EDU 9: where the top level provides a general overview
EDU 10: and users can navigate the hierarchy
EDU 11: to drill down for more details on topics of interest .
EDU 12: compared to flat multi-document summaries ,
EDU 13: users prefer summa ten times as often
EDU 14: and learn just as much ,
EDU 15: and compared to timelines ,
EDU 16: users prefer summa three times as often
EDU 17: and learn more in twice as many cases .
EDU 0:
EDU 1: for topics
EDU 2: that cover large amounts of information ,
EDU 3: simple , short summaries are insufficient
EDU 4: - complex topics require more information and more structure to understand .
EDU 5: we propose a new approach to scaling up summarization
EDU 6: called hierarchical summarization ,
EDU 7: and present the first implemented system , summa .
EDU 8: summa produces a hierarchy of relatively short summaries ,
EDU 9: where the top level provides a general overview
EDU 10: and users can navigate the hierarchy
EDU 11: to drill down for more details on topics of interest .
EDU 12: compared to flat multi-document summaries ,
EDU 13: users prefer summa ten times as often
EDU 14: and learn just as much ,
EDU 15: and compared to timelines ,
EDU 16: users prefer summa three times as often
EDU 17: and learn more in twice as many cases .
EDU 0:
EDU 1: update summarization is a form of multi-document summarization
EDU 2: where a document set must be summarized in the context of other documents
EDU 3: assumed to be known .
EDU 4: efficient update summarization must focus on identifying new information and avoiding repetition of known information .
EDU 5: in query-focused summarization , the task is to produce a summary as an answer to a given query .
EDU 6: we introduce a new task , query-chain summarization ,
EDU 7: which combines aspects of the two previous tasks :
EDU 8: starting from a given document set ,
EDU 9: increasingly specific queries are considered ,
EDU 10: and a new summary is produced at each step .
EDU 11: this process models exploratory search :
EDU 12: a user explores a new topic
EDU 13: by submitting a sequence of queries ,
EDU 14: inspecting a summary of the result set
EDU 15: and phrasing a new query at each step .
EDU 16: we present a novel dataset
EDU 17: comprising 00 query-chains sessions of length up to 0 with 0 matching human summaries each in the consumer-health domain .
EDU 18: our analysis demonstrates
EDU 19: that summaries produced in the context of such exploratory process are different from informative summaries .
EDU 20: we present an algorithm for query-chain summarization
EDU 21: based on a new lda topic model variant .
EDU 22: evaluation indicates
EDU 23: the algorithm improves on strong baselines .
EDU 0:
EDU 1: update summarization is a form of multi-document summarization
EDU 2: where a document set must be summarized in the context of other documents
EDU 3: assumed to be known .
EDU 4: efficient update summarization must focus on identifying new information and avoiding repetition of known information .
EDU 5: in query-focused summarization , the task is to produce a summary as an answer to a given query .
EDU 6: we introduce a new task , query-chain summarization ,
EDU 7: which combines aspects of the two previous tasks :
EDU 8: starting from a given document set ,
EDU 9: increasingly specific queries are considered ,
EDU 10: and a new summary is produced at each step .
EDU 11: this process models exploratory search :
EDU 12: a user explores a new topic
EDU 13: by submitting a sequence of queries ,
EDU 14: inspecting a summary of the result set
EDU 15: and phrasing a new query at each step .
EDU 16: we present a novel dataset
EDU 17: comprising 00 query-chains sessions of length up to 0 with 0 matching human summaries each in the consumer-health domain .
EDU 18: our analysis demonstrates
EDU 19: that summaries produced in the context of such exploratory process are different from informative summaries .
EDU 20: we present an algorithm for query-chain summarization
EDU 21: based on a new lda topic model variant .
EDU 22: evaluation indicates
EDU 23: the algorithm improves on strong baselines .
EDU 0:
EDU 1: we study the use of temporal information in the form of timelines
EDU 2: to enhance multi-document summarization .
EDU 3: we employ a fully automated temporal processing system
EDU 4: to generate a timeline for each input document .
EDU 5: we derive three features from these timelines ,
EDU 6: and show
EDU 7: that their use in supervised summarization lead to a significant 0.0 % improvement in rouge performance over a state-of-the-art base-line .
EDU 8: in addition , we propose timemmr , a modification to maximal marginal relevance
EDU 9: that promotes temporal diversity
EDU 10: by way of computing time span similarity ,
EDU 11: and show its utility in summarizing certain document sets .
EDU 12: we also propose a filtering metric
EDU 13: to discard noisy timelines
EDU 14: generated by our automatic processes ,
EDU 15: to purify the timeline input for summarization .
EDU 16: by selectively using timelines
EDU 17: guided by filtering ,
EDU 18: overall summarization performance is increased by a significant 0.0 % .
EDU 0:
EDU 1: we study the use of temporal information in the form of timelines
EDU 2: to enhance multi-document summarization .
EDU 3: we employ a fully automated temporal processing system
EDU 4: to generate a timeline for each input document .
EDU 5: we derive three features from these timelines ,
EDU 6: and show
EDU 7: that their use in supervised summarization lead to a significant 0.0 % improvement in rouge performance over a state-of-the-art base-line .
EDU 8: in addition , we propose timemmr , a modification to maximal marginal relevance
EDU 9: that promotes temporal diversity
EDU 10: by way of computing time span similarity ,
EDU 11: and show its utility in summarizing certain document sets .
EDU 12: we also propose a filtering metric
EDU 13: to discard noisy timelines
EDU 14: generated by our automatic processes ,
EDU 15: to purify the timeline input for summarization .
EDU 16: by selectively using timelines
EDU 17: guided by filtering ,
EDU 18: overall summarization performance is increased by a significant 0.0 % .
EDU 0:
EDU 1: following the works of carletta ( 0000 ) and artstein and poesio ( 0000 ) ,
EDU 2: there is an increasing consensus within the field
EDU 3: that in order to properly gauge the reliability of an annotation effort ,
EDU 4: chance-corrected measures of inter-annotator agreement should be used .
EDU 5: with this in mind , it is striking that virtually all evaluations of syntactic annotation efforts use uncorrected parser evaluation metrics such as bracket f0
EDU 6: ( for phrase structure )
EDU 7: and accuracy scores
EDU 8: ( for dependencies ) .
EDU 9: in this work we present a chance-corrected metric
EDU 10: based on krippendorff's α,
EDU 11: adapted to the structure of syntactic annotations and applicable both to phrase structure and dependency annotation without any modifications .
EDU 12: to evaluate our metric
EDU 13: we first present a number of synthetic experiments
EDU 14: to better control the sources of noise
EDU 15: and gauge the metric's responses ,
EDU 16: before finally contrasting the behaviour of our chance-corrected metric with that of uncorrected parser evaluation metrics on real corpora.
EDU 0:
EDU 1: following the works of carletta ( 0000 ) and artstein and poesio ( 0000 ) ,
EDU 2: there is an increasing consensus within the field
EDU 3: that in order to properly gauge the reliability of an annotation effort ,
EDU 4: chance-corrected measures of inter-annotator agreement should be used .
EDU 5: with this in mind , it is striking that virtually all evaluations of syntactic annotation efforts use uncorrected parser evaluation metrics such as bracket f0
EDU 6: ( for phrase structure )
EDU 7: and accuracy scores
EDU 8: ( for dependencies ) .
EDU 9: in this work we present a chance-corrected metric
EDU 10: based on krippendorff's α,
EDU 11: adapted to the structure of syntactic annotations and applicable both to phrase structure and dependency annotation without any modifications .
EDU 12: to evaluate our metric
EDU 13: we first present a number of synthetic experiments
EDU 14: to better control the sources of noise
EDU 15: and gauge the metric's responses ,
EDU 16: before finally contrasting the behaviour of our chance-corrected metric with that of uncorrected parser evaluation metrics on real corpora.
EDU 0:
EDU 1: we present wibi , an approach to the automatic creation of a bitaxonomy for wikipedia ,
EDU 2: that is , an integrated taxonomy of wikipage pages and categories .
EDU 3: we leverage the information available in either one of the taxonomies
EDU 4: to reinforce the creation of the other taxonomy .
EDU 5: our experiments show higher quality and coverage than state-of-the-art resources like dbpedia , yago , menta , wikinet and wikitaxonomy .
EDU 6: wibi is available at http : //wibitaxonomy.org .
EDU 0:
EDU 1: we present wibi , an approach to the automatic creation of a bitaxonomy for wikipedia ,
EDU 2: that is , an integrated taxonomy of wikipage pages and categories .
EDU 3: we leverage the information available in either one of the taxonomies
EDU 4: to reinforce the creation of the other taxonomy .
EDU 5: our experiments show higher quality and coverage than state-of-the-art resources like dbpedia , yago , menta , wikinet and wikitaxonomy .
EDU 6: wibi is available at http : //wibitaxonomy.org .
EDU 0:
EDU 1: answering natural language questions
EDU 2: using the freebase knowledge base
EDU 3: has recently been explored as a platform
EDU 4: for advancing the state of the art in open domain semantic parsing .
EDU 5: those efforts map questions to sophisticated meaning representations
EDU 6: that are then attempted to be matched against viable answer candidates in the knowledge base .
EDU 7: here we show
EDU 8: that relatively modest information extraction techniques ,
EDU 9: when paired with a web-scale corpus ,
EDU 10: can outperform these sophisticated approaches by roughly 00 % relative gain .
EDU 0:
EDU 1: answering natural language questions
EDU 2: using the freebase knowledge base
EDU 3: has recently been explored as a platform
EDU 4: for advancing the state of the art in open domain semantic parsing .
EDU 5: those efforts map questions to sophisticated meaning representations
EDU 6: that are then attempted to be matched against viable answer candidates in the knowledge base .
EDU 7: here we show
EDU 8: that relatively modest information extraction techniques ,
EDU 9: when paired with a web-scale corpus ,
EDU 10: can outperform these sophisticated approaches by roughly 00 % relative gain .
EDU 0:
EDU 1: a typical knowledge-based question answering ( kb-qa ) system faces two challenges :
EDU 2: one is to transform natural language questions into their meaning representations ( mrs ) ;
EDU 3: the other is to retrieve answers from knowledge bases ( kbs )
EDU 4: using generated mrs .
EDU 5: unlike previous methods
EDU 6: which treat them in a cascaded manner ,
EDU 7: we present a translation-based approach
EDU 8: to solve these two tasks in one unified framework .
EDU 9: we translate questions to answers based on cyk parsing .
EDU 10: answers as translations of the span
EDU 11: covered by each cyk cell
EDU 12: are obtained by a question translation method ,
EDU 13: which first generates formal triple queries as mrs for the span
EDU 14: based on question patterns and relation expressions ,
EDU 15: and then retrieves answers from a given kb
EDU 16: based on triple queries generated .
EDU 17: a linear model is defined over derivations ,
EDU 18: and minimum error rate training is used
EDU 19: to tune feature weights
EDU 20: based on a set of question-answer pairs .
EDU 21: compared to a kb-qa system
EDU 22: using a state-of-the-art semantic parser ,
EDU 23: our method achieves better results .
EDU 0:
EDU 1: a typical knowledge-based question answering ( kb-qa ) system faces two challenges :
EDU 2: one is to transform natural language questions into their meaning representations ( mrs ) ;
EDU 3: the other is to retrieve answers from knowledge bases ( kbs )
EDU 4: using generated mrs .
EDU 5: unlike previous methods
EDU 6: which treat them in a cascaded manner ,
EDU 7: we present a translation-based approach
EDU 8: to solve these two tasks in one unified framework .
EDU 9: we translate questions to answers based on cyk parsing .
EDU 10: answers as translations of the span
EDU 11: covered by each cyk cell
EDU 12: are obtained by a question translation method ,
EDU 13: which first generates formal triple queries as mrs for the span
EDU 14: based on question patterns and relation expressions ,
EDU 15: and then retrieves answers from a given kb
EDU 16: based on triple queries generated .
EDU 17: a linear model is defined over derivations ,
EDU 18: and minimum error rate training is used
EDU 19: to tune feature weights
EDU 20: based on a set of question-answer pairs .
EDU 21: compared to a kb-qa system
EDU 22: using a state-of-the-art semantic parser ,
EDU 23: our method achieves better results .
EDU 0:
EDU 1: we propose a robust answer reranking model for non-factoid questions
EDU 2: that integrates lexical semantics with discourse information ,
EDU 3: driven by two representations of discourse :
EDU 4: a shallow representation
EDU 5: centered around discourse markers ,
EDU 6: and a deep one
EDU 7: based on rhetorical structure theory .
EDU 8: we evaluate the proposed model on two corpora from different genres and domains :
EDU 9: one from yahoo ! answers and one from the biology domain , and two types of non-factoid questions : manner and reason .
EDU 10: we experimentally demonstrate
EDU 11: that the discourse structure of non-factoid answers provides information
EDU 12: that is complementary to lexical semantic similarity between question and answer ,
EDU 13: improving performance up to 00 % ( relative ) over a state-of-the-art model
EDU 14: that exploits lexical semantic similarity alone .
EDU 15: we further demonstrate excellent domain transfer of discourse information ,
EDU 16: suggesting
EDU 17: these discourse features have general utility to non-factoid question answering .
EDU 0:
EDU 1: we propose a robust answer reranking model for non-factoid questions
EDU 2: that integrates lexical semantics with discourse information ,
EDU 3: driven by two representations of discourse :
EDU 4: a shallow representation
EDU 5: centered around discourse markers ,
EDU 6: and a deep one
EDU 7: based on rhetorical structure theory .
EDU 8: we evaluate the proposed model on two corpora from different genres and domains :
EDU 9: one from yahoo ! answers and one from the biology domain , and two types of non-factoid questions : manner and reason .
EDU 10: we experimentally demonstrate
EDU 11: that the discourse structure of non-factoid answers provides information
EDU 12: that is complementary to lexical semantic similarity between question and answer ,
EDU 13: improving performance up to 00 % ( relative ) over a state-of-the-art model
EDU 14: that exploits lexical semantic similarity alone .
EDU 15: we further demonstrate excellent domain transfer of discourse information ,
EDU 16: suggesting
EDU 17: these discourse features have general utility to non-factoid question answering .
EDU 0:
EDU 1: we propose a supervised method
EDU 2: of extracting event causalities like conduct slash-and-burn agriculture→exacerbate desertification from the web
EDU 3: using semantic relation
EDU 4: ( between nouns )
EDU 5: , context , and association features .
EDU 6: experiments show
EDU 7: that our method outperforms baselines
EDU 8: that are based on state-of-the-art methods .
EDU 9: we also propose methods
EDU 10: of generating future scenarios like conduct slash-and-burn agriculture→exacerbate desertification→increase asian dust (from china)→asthma gets worse .
EDU 11: experiments show
EDU 12: that we can generate 00,000 scenarios with 00 % precision .
EDU 13: we also generated a scenario deforestation continues→global warming worsens→sea temperatures rise→vibrio parahaemolyticus fouls (water) ,
EDU 14: which is written in no document in our input web corpus
EDU 15: crawled in 0000 .
EDU 16: but the vibrio risk
EDU 17: due to global warming
EDU 18: was observed in baker-austin et al. ( 0000 ) .
EDU 19: thus, we " predicted " the future event sequence in a sense .
EDU 0:
EDU 1: we propose a supervised method
EDU 2: of extracting event causalities like conduct slash-and-burn agriculture→exacerbate desertification from the web
EDU 3: using semantic relation
EDU 4: ( between nouns )
EDU 5: , context , and association features .
EDU 6: experiments show
EDU 7: that our method outperforms baselines
EDU 8: that are based on state-of-the-art methods .
EDU 9: we also propose methods
EDU 10: of generating future scenarios like conduct slash-and-burn agriculture→exacerbate desertification→increase asian dust (from china)→asthma gets worse .
EDU 11: experiments show
EDU 12: that we can generate 00,000 scenarios with 00 % precision .
EDU 13: we also generated a scenario deforestation continues→global warming worsens→sea temperatures rise→vibrio parahaemolyticus fouls (water) ,
EDU 14: which is written in no document in our input web corpus
EDU 15: crawled in 0000 .
EDU 16: but the vibrio risk
EDU 17: due to global warming
EDU 18: was observed in baker-austin et al. ( 0000 ) .
EDU 19: thus, we " predicted " the future event sequence in a sense .
EDU 0:
EDU 1: cross-narrative temporal ordering of medical events is essential to the task
EDU 2: of generating a comprehensive timeline over a patient's history .
EDU 3: we address the problem of aligning multiple medical event sequences ,
EDU 4: corresponding to different clinical narratives ,
EDU 5: comparing the following approaches :
EDU 6: ( 0 ) a novel weighted finite state transducer representation of medical event sequences
EDU 7: that enables composition and search for decoding ,
EDU 8: and ( 0 ) dynamic programming with iterative pairwise alignment of multiple sequences
EDU 9: using global and local alignment algorithms .
EDU 10: the cross-narrative coreference and temporal relation weights
EDU 11: used in both these approaches
EDU 12: are learned from a corpus of clinical narratives .
EDU 13: we present results
EDU 14: using both approaches
EDU 15: and observe
EDU 16: that the finite state transducer approach performs significantly better than the dynamic programming one by 0.0 % for the problem of multiple-sequence alignment .
EDU 0:
EDU 1: cross-narrative temporal ordering of medical events is essential to the task
EDU 2: of generating a comprehensive timeline over a patient's history .
EDU 3: we address the problem of aligning multiple medical event sequences ,
EDU 4: corresponding to different clinical narratives ,
EDU 5: comparing the following approaches :
EDU 6: ( 0 ) a novel weighted finite state transducer representation of medical event sequences
EDU 7: that enables composition and search for decoding ,
EDU 8: and ( 0 ) dynamic programming with iterative pairwise alignment of multiple sequences
EDU 9: using global and local alignment algorithms .
EDU 10: the cross-narrative coreference and temporal relation weights
EDU 11: used in both these approaches
EDU 12: are learned from a corpus of clinical narratives .
EDU 13: we present results
EDU 14: using both approaches
EDU 15: and observe
EDU 16: that the finite state transducer approach performs significantly better than the dynamic programming one by 0.0 % for the problem of multiple-sequence alignment .
EDU 0:
EDU 1: this paper introduces factchecker , language-aware approach to truth-finding .
EDU 2: factchecker differs from prior approaches
EDU 3: in that it does not rely on iterative peer voting ,
EDU 4: instead it leverages language
EDU 5: to infer believability of fact candidates .
EDU 6: in particular , factchecker makes use of linguistic features
EDU 7: to detect
EDU 8: if a given source objectively states facts
EDU 9: or is speculative and opinionated .
EDU 10: to ensure that fact candidates
EDU 11: mentioned in similar sources
EDU 12: have similar believability ,
EDU 13: factchecker augments objectivity with a co-mention score
EDU 14: to compute the overall believability score of a fact candidate .
EDU 15: our experiments on various datasets show
EDU 16: that factchecker yields higher accuracy than existing approaches .
EDU 0:
EDU 1: this paper introduces factchecker , language-aware approach to truth-finding .
EDU 2: factchecker differs from prior approaches
EDU 3: in that it does not rely on iterative peer voting ,
EDU 4: instead it leverages language
EDU 5: to infer believability of fact candidates .
EDU 6: in particular , factchecker makes use of linguistic features
EDU 7: to detect
EDU 8: if a given source objectively states facts
EDU 9: or is speculative and opinionated .
EDU 10: to ensure that fact candidates
EDU 11: mentioned in similar sources
EDU 12: have similar believability ,
EDU 13: factchecker augments objectivity with a co-mention score
EDU 14: to compute the overall believability score of a fact candidate .
EDU 15: our experiments on various datasets show
EDU 16: that factchecker yields higher accuracy than existing approaches .
EDU 0:
EDU 1: in this paper , we propose an unsupervised method
EDU 2: to identify noun sense changes
EDU 3: based on rigorous analysis of time-varying text data available in the form of millions of digitized books .
EDU 4: we construct distributional thesauri based networks from data at different time points
EDU 5: and cluster each of them separately
EDU 6: to obtain word-centric sense clusters
EDU 7: corresponding to the different time points .
EDU 8: subsequently , we compare these sense clusters of two different time points
EDU 9: to find
EDU 10: if ( i ) there is birth of a new sense or
EDU 11: ( ii ) if an older sense has got split into more than one sense or
EDU 12: ( iii ) if a newer sense has been formed from the joining of older senses or
EDU 13: ( iv ) if a particular sense has died .
EDU 14: we conduct a thorough evaluation of the proposed method-ology both manually as well as through comparison with wordnet .
EDU 15: manual evaluation indicates
EDU 16: that the algorithm could correctly identify 00.0 % birth cases from a set of 00 randomly picked samples and 00 % split/join cases from a set of 00 randomly picked samples .
EDU 17: remarkably , in 00 % cases the birth of a novel sense is attested by wordnet ,
EDU 18: while in 00 % cases and 00 % cases split and join are respectively confirmed by wordnet .
EDU 19: our approach can be applied for lexicography , as well as for applications like word sense disambiguation or semantic search .
EDU 0:
EDU 1: in this paper , we propose an unsupervised method
EDU 2: to identify noun sense changes
EDU 3: based on rigorous analysis of time-varying text data available in the form of millions of digitized books .
EDU 4: we construct distributional thesauri based networks from data at different time points
EDU 5: and cluster each of them separately
EDU 6: to obtain word-centric sense clusters
EDU 7: corresponding to the different time points .
EDU 8: subsequently , we compare these sense clusters of two different time points
EDU 9: to find
EDU 10: if ( i ) there is birth of a new sense or
EDU 11: ( ii ) if an older sense has got split into more than one sense or
EDU 12: ( iii ) if a newer sense has been formed from the joining of older senses or
EDU 13: ( iv ) if a particular sense has died .
EDU 14: we conduct a thorough evaluation of the proposed method-ology both manually as well as through comparison with wordnet .
EDU 15: manual evaluation indicates
EDU 16: that the algorithm could correctly identify 00.0 % birth cases from a set of 00 randomly picked samples and 00 % split/join cases from a set of 00 randomly picked samples .
EDU 17: remarkably , in 00 % cases the birth of a novel sense is attested by wordnet ,
EDU 18: while in 00 % cases and 00 % cases split and join are respectively confirmed by wordnet .
EDU 19: our approach can be applied for lexicography , as well as for applications like word sense disambiguation or semantic search .
EDU 0:
EDU 1: we present an unsupervised method
EDU 2: for inducing verb classes from verb uses in giga-word corpora .
EDU 3: our method consists of two clustering steps :
EDU 4: verb-specific semantic frames are first induced
EDU 5: by clustering verb uses in a corpus
EDU 6: and then verb classes are induced
EDU 7: by clustering these frames .
EDU 8: by taking this step-wise approach ,
EDU 9: we can not only generate verb classes
EDU 10: based on a massive amount of verb uses in a scalable manner ,
EDU 11: but also deal with verb polysemy ,
EDU 12: which is bypassed by most of the previous studies on verb clustering .
EDU 13: in our experiments , we acquire semantic frames and verb classes from two giga-word corpora , the larger
EDU 14: comprising 00 billion words .
EDU 15: the effectiveness of our approach is verified through quantitative evaluations
EDU 16: based on polysemy-aware gold-standard data .
EDU 0:
EDU 1: we present an unsupervised method
EDU 2: for inducing verb classes from verb uses in giga-word corpora .
EDU 3: our method consists of two clustering steps :
EDU 4: verb-specific semantic frames are first induced
EDU 5: by clustering verb uses in a corpus
EDU 6: and then verb classes are induced
EDU 7: by clustering these frames .
EDU 8: by taking this step-wise approach ,
EDU 9: we can not only generate verb classes
EDU 10: based on a massive amount of verb uses in a scalable manner ,
EDU 11: but also deal with verb polysemy ,
EDU 12: which is bypassed by most of the previous studies on verb clustering .
EDU 13: in our experiments , we acquire semantic frames and verb classes from two giga-word corpora , the larger
EDU 14: comprising 00 billion words .
EDU 15: the effectiveness of our approach is verified through quantitative evaluations
EDU 16: based on polysemy-aware gold-standard data .
EDU 0:
EDU 1: we present a structured learning approach to inducing hypernym taxonomies
EDU 2: using a probabilistic graphical model formulation .
EDU 3: our model incorporates heterogeneous relational evidence about both hypernymy and siblinghood ,
EDU 4: captured by semantic features
EDU 5: based on patterns and statistics from web n-grams and wikipedia abstracts .
EDU 6: for efficient inference over taxonomy structures , we use loopy belief propagation along with a directed spanning tree algorithm for the core hypernymy factor .
EDU 7: to train the system ,
EDU 8: we extract sub-structures of wordnet
EDU 9: and discriminatively learn to reproduce them ,
EDU 10: using adaptive subgradient stochastic optimization .
EDU 11: on the task
EDU 12: of reproducing sub-hierarchies of wordnet ,
EDU 13: our approach achieves a 00 % error reduction over a chance baseline ,
EDU 14: including a 00 % error reduction
EDU 15: due to the non-hypernym-factored sibling features .
EDU 16: on a comparison setup , we find up to 00 % relative error reduction over previous work on ancestor f0 .
EDU 0:
EDU 1: we present a structured learning approach to inducing hypernym taxonomies
EDU 2: using a probabilistic graphical model formulation .
EDU 3: our model incorporates heterogeneous relational evidence about both hypernymy and siblinghood ,
EDU 4: captured by semantic features
EDU 5: based on patterns and statistics from web n-grams and wikipedia abstracts .
EDU 6: for efficient inference over taxonomy structures , we use loopy belief propagation along with a directed spanning tree algorithm for the core hypernymy factor .
EDU 7: to train the system ,
EDU 8: we extract sub-structures of wordnet
EDU 9: and discriminatively learn to reproduce them ,
EDU 10: using adaptive subgradient stochastic optimization .
EDU 11: on the task
EDU 12: of reproducing sub-hierarchies of wordnet ,
EDU 13: our approach achieves a 00 % error reduction over a chance baseline ,
EDU 14: including a 00 % error reduction
EDU 15: due to the non-hypernym-factored sibling features .
EDU 16: on a comparison setup , we find up to 00 % relative error reduction over previous work on ancestor f0 .
EDU 0:
EDU 1: we introduce a provably correct learning algorithm for latent-variable pcfgs .
EDU 2: the algorithm relies on two steps :
EDU 3: first , the use of a matrix-decomposition algorithm
EDU 4: applied to a co-occurrence matrix
EDU 5: estimated from the parse trees in a training sample ;
EDU 6: second , the use of em
EDU 7: applied to a convex objective
EDU 8: derived from the training samples in combination with the output from the matrix decomposition .
EDU 9: experiments on parsing and a language modeling problem show
EDU 10: that the algorithm is efficient and effective in practice .
EDU 0:
EDU 1: we introduce a provably correct learning algorithm for latent-variable pcfgs .
EDU 2: the algorithm relies on two steps :
EDU 3: first , the use of a matrix-decomposition algorithm
EDU 4: applied to a co-occurrence matrix
EDU 5: estimated from the parse trees in a training sample ;
EDU 6: second , the use of em
EDU 7: applied to a convex objective
EDU 8: derived from the training samples in combination with the output from the matrix decomposition .
EDU 9: experiments on parsing and a language modeling problem show
EDU 10: that the algorithm is efficient and effective in practice .
EDU 0:
EDU 1: we propose a spectral approach for unsupervised constituent parsing
EDU 2: that comes with theoretical guarantees on latent structure recovery .
EDU 3: our approach is grammar-less -
EDU 4: we directly learn the bracketing structure of a given sentence
EDU 5: without using a grammar model .
EDU 6: the main algorithm is based on lifting the concept of additive tree metrics for structure learning of latent trees in the phylogenetic and machine learning communities to the case
EDU 7: where the tree structure varies across examples .
EDU 8: although finding the "minimal" latent tree is np-hard in general ,
EDU 9: for the case of projective trees we find
EDU 10: that it can be found
EDU 11: using bilexical parsing algorithms .
EDU 12: empirically , our algorithm performs favorably
EDU 13: compared to the constituent context model of klein and manning ( 0000 )
EDU 14: without the need for careful initialization .
EDU 0:
EDU 1: we propose a spectral approach for unsupervised constituent parsing
EDU 2: that comes with theoretical guarantees on latent structure recovery .
EDU 3: our approach is grammar-less -
EDU 4: we directly learn the bracketing structure of a given sentence
EDU 5: without using a grammar model .
EDU 6: the main algorithm is based on lifting the concept of additive tree metrics for structure learning of latent trees in the phylogenetic and machine learning communities to the case
EDU 7: where the tree structure varies across examples .
EDU 8: although finding the "minimal" latent tree is np-hard in general ,
EDU 9: for the case of projective trees we find
EDU 10: that it can be found
EDU 11: using bilexical parsing algorithms .
EDU 12: empirically , our algorithm performs favorably
EDU 13: compared to the constituent context model of klein and manning ( 0000 )
EDU 14: without the need for careful initialization .
EDU 0:
EDU 1: semantic parsers map natural language statements into meaning representations ,
EDU 2: and must abstract over syntactic phenomena ,
EDU 3: resolve anaphora ,
EDU 4: and identify word senses
EDU 5: to eliminate ambiguous interpretations .
EDU 6: abstract meaning representation ( amr ) is a recent example of one such semantic formalism
EDU 7: which , similar to a dependency parse , utilizes a graph
EDU 8: to represent relationships between concepts ( banarescu et al. , 0000 ) .
EDU 9: as with dependency parsing ,
EDU 10: transition-based approaches are a common approach to this problem .
EDU 11: however , when trained in the traditional manner
EDU 12: these systems are susceptible to the accumulation of errors
EDU 13: when they find undesirable states during greedy decoding .
EDU 14: imitation learning algorithms have been shown
EDU 15: to help these systems recover from such errors .
EDU 16: to effectively use these methods for amr parsing
EDU 17: we find it highly beneficial to introduce two novel extensions :
EDU 18: noise reduction and targeted exploration .
EDU 19: the former mitigates the noise in the feature representation ,
EDU 20: a result of the complexity of the task .
EDU 21: the latter targets the exploration steps of imitation learning towards areas
EDU 22: which are likely to provide the most information in the context of a large action-space .
EDU 23: we achieve state-ofthe art results ,
EDU 24: and improve upon standard transition-based parsing by 0.0 f0 points .
EDU 0:
EDU 1: semantic parsers map natural language statements into meaning representations ,
EDU 2: and must abstract over syntactic phenomena ,
EDU 3: resolve anaphora ,
EDU 4: and identify word senses
EDU 5: to eliminate ambiguous interpretations .
EDU 6: abstract meaning representation ( amr ) is a recent example of one such semantic formalism
EDU 7: which , similar to a dependency parse , utilizes a graph
EDU 8: to represent relationships between concepts ( banarescu et al. , 0000 ) .
EDU 9: as with dependency parsing ,
EDU 10: transition-based approaches are a common approach to this problem .
EDU 11: however , when trained in the traditional manner
EDU 12: these systems are susceptible to the accumulation of errors
EDU 13: when they find undesirable states during greedy decoding .
EDU 14: imitation learning algorithms have been shown
EDU 15: to help these systems recover from such errors .
EDU 16: to effectively use these methods for amr parsing
EDU 17: we find it highly beneficial to introduce two novel extensions :
EDU 18: noise reduction and targeted exploration .
EDU 19: the former mitigates the noise in the feature representation ,
EDU 20: a result of the complexity of the task .
EDU 21: the latter targets the exploration steps of imitation learning towards areas
EDU 22: which are likely to provide the most information in the context of a large action-space .
EDU 23: we achieve state-ofthe art results ,
EDU 24: and improve upon standard transition-based parsing by 0.0 f0 points .
EDU 0:
EDU 1: modeling crisp logical regularities is crucial in semantic parsing ,
EDU 2: making it difficult for neural models with no task-specific prior knowledge to achieve good results .
EDU 3: in this paper , we introduce data recombination ,
EDU 4: a novel framework
EDU 5: for injecting such prior knowledge into a model .
EDU 6: from the training data , we induce a highprecision synchronous context-free grammar ,
EDU 7: which captures important conditional independence properties
EDU 8: commonly found in semantic parsing .
EDU 9: we then train a sequence-to-sequence recurrent network ( rnn ) model with a novel attention-based copying mechanism on datapoints
EDU 10: sampled from this grammar ,
EDU 11: thereby teaching the model about these structural properties .
EDU 12: data recombination improves the accuracy of our rnn model on three semantic parsing datasets ,
EDU 13: leading to new state-of-the-art performance on the standard geoquery dataset for models with comparable supervision .
EDU 0:
EDU 1: modeling crisp logical regularities is crucial in semantic parsing ,
EDU 2: making it difficult for neural models with no task-specific prior knowledge to achieve good results .
EDU 3: in this paper , we introduce data recombination ,
EDU 4: a novel framework
EDU 5: for injecting such prior knowledge into a model .
EDU 6: from the training data , we induce a highprecision synchronous context-free grammar ,
EDU 7: which captures important conditional independence properties
EDU 8: commonly found in semantic parsing .
EDU 9: we then train a sequence-to-sequence recurrent network ( rnn ) model with a novel attention-based copying mechanism on datapoints
EDU 10: sampled from this grammar ,
EDU 11: thereby teaching the model about these structural properties .
EDU 12: data recombination improves the accuracy of our rnn model on three semantic parsing datasets ,
EDU 13: leading to new state-of-the-art performance on the standard geoquery dataset for models with comparable supervision .
EDU 0:
EDU 1: a core problem in learning semantic parsers from denotations is picking out consistent logical forms-those
EDU 2: that yield the correct denotation-from a combinatorially large space .
EDU 3: to control the search space ,
EDU 4: previous work relied on restricted set of rules ,
EDU 5: which limits expressivity .
EDU 6: in this paper , we consider a much more expressive class of logical forms ,
EDU 7: and show
EDU 8: how to use dynamic programming
EDU 9: to efficiently represent the complete set of consistent logical forms .
EDU 10: expressivity also introduces many more spurious logical forms
EDU 11: which are consistent with the correct denotation
EDU 12: but do not represent the meaning of the utterance .
EDU 13: to address this ,
EDU 14: we generate fictitious worlds
EDU 15: and use crowdsourced denotations on these worlds
EDU 16: to filter out spurious logical forms .
EDU 17: on the wikitablequestions dataset , we increase the coverage of answerable questions from 00.0 % to 00 % ,
EDU 18: and the additional crowdsourced supervision lets us rule out 00.0 % of spurious logical forms .
EDU 0:
EDU 1: a core problem in learning semantic parsers from denotations is picking out consistent logical forms-those
EDU 2: that yield the correct denotation-from a combinatorially large space .
EDU 3: to control the search space ,
EDU 4: previous work relied on restricted set of rules ,
EDU 5: which limits expressivity .
EDU 6: in this paper , we consider a much more expressive class of logical forms ,
EDU 7: and show
EDU 8: how to use dynamic programming
EDU 9: to efficiently represent the complete set of consistent logical forms .
EDU 10: expressivity also introduces many more spurious logical forms
EDU 11: which are consistent with the correct denotation
EDU 12: but do not represent the meaning of the utterance .
EDU 13: to address this ,
EDU 14: we generate fictitious worlds
EDU 15: and use crowdsourced denotations on these worlds
EDU 16: to filter out spurious logical forms .
EDU 17: on the wikitablequestions dataset , we increase the coverage of answerable questions from 00.0 % to 00 % ,
EDU 18: and the additional crowdsourced supervision lets us rule out 00.0 % of spurious logical forms .
EDU 0:
EDU 1: semantic parsing aims at mapping natural language to machine interpretable meaning representations .
EDU 2: traditional approaches rely on high-quality lexicons , manually-built templates , and linguistic features
EDU 3: which are either domainor representation-specific .
EDU 4: in this paper we present a general method
EDU 5: based on an attention-enhanced encoder-decoder model .
EDU 6: we encode input utterances into vector representations ,
EDU 7: and generate their logical forms
EDU 8: by conditioning the output sequences or trees on the encoding vectors .
EDU 9: experimental results on four atasets show
EDU 10: that our approach performs competitively
EDU 11: without using hand-engineered features
EDU 12: and is easy to adapt across domains and meaning representations .
EDU 0:
EDU 1: semantic parsing aims at mapping natural language to machine interpretable meaning representations .
EDU 2: traditional approaches rely on high-quality lexicons , manually-built templates , and linguistic features
EDU 3: which are either domainor representation-specific .
EDU 4: in this paper we present a general method
EDU 5: based on an attention-enhanced encoder-decoder model .
EDU 6: we encode input utterances into vector representations ,
EDU 7: and generate their logical forms
EDU 8: by conditioning the output sequences or trees on the encoding vectors .
EDU 9: experimental results on four atasets show
EDU 10: that our approach performs competitively
EDU 11: without using hand-engineered features
EDU 12: and is easy to adapt across domains and meaning representations .
EDU 0:
EDU 1: slot filling aims to extract the values ( slot fillers ) of specific attributes ( slots types ) for a given entity ( query ) from a largescale corpus .
EDU 2: slot filling remains very challenging over the past seven years .
EDU 3: we propose a simple yet effective unsupervised approach
EDU 4: to extract slot fillers
EDU 5: based on the following two observations :
EDU 6: ( 0 ) a trigger is usually a salient node relative to the query and filler nodes in the dependency graph of a context sentence ;
EDU 7: ( 0 ) a relation is likely to exist
EDU 8: if the query and candidate filler nodes are strongly connected by a relation-specific trigger .
EDU 9: thus we design a graph-based algorithm
EDU 10: to automatically identify triggers
EDU 11: based on personalized pagerank and affinity propagation for a given ( query , filler ) pair
EDU 12: and then label the slot type
EDU 13: based on the identified triggers .
EDU 14: our approach achieves 00.0 % -00 % higher f-score over state-ofthe-art english slot filling methods .
EDU 15: our experiments also demonstrate
EDU 16: that as long as a few trigger seeds , name tagging and dependency parsing capabilities exist ,
EDU 17: this approach can be quickly adapted to any language and new slot types .
EDU 18: our promising results on chinese slot filling can serve as a new benchmark .
EDU 0:
EDU 1: slot filling aims to extract the values ( slot fillers ) of specific attributes ( slots types ) for a given entity ( query ) from a largescale corpus .
EDU 2: slot filling remains very challenging over the past seven years .
EDU 3: we propose a simple yet effective unsupervised approach
EDU 4: to extract slot fillers
EDU 5: based on the following two observations :
EDU 6: ( 0 ) a trigger is usually a salient node relative to the query and filler nodes in the dependency graph of a context sentence ;
EDU 7: ( 0 ) a relation is likely to exist
EDU 8: if the query and candidate filler nodes are strongly connected by a relation-specific trigger .
EDU 9: thus we design a graph-based algorithm
EDU 10: to automatically identify triggers
EDU 11: based on personalized pagerank and affinity propagation for a given ( query , filler ) pair
EDU 12: and then label the slot type
EDU 13: based on the identified triggers .
EDU 14: our approach achieves 00.0 % -00 % higher f-score over state-ofthe-art english slot filling methods .
EDU 15: our experiments also demonstrate
EDU 16: that as long as a few trigger seeds , name tagging and dependency parsing capabilities exist ,
EDU 17: this approach can be quickly adapted to any language and new slot types .
EDU 18: our promising results on chinese slot filling can serve as a new benchmark .
EDU 0:
EDU 1: when a large-scale incident or disaster occurs ,
EDU 2: there is often a great demand for rapidly developing a system
EDU 3: to extract detailed and new information from lowresource languages ( lls ) .
EDU 4: we propose a novel approach
EDU 5: to discover comparable documents in high-resource languages ( hls ) ,
EDU 6: and project entity discovery and linking results from hls documents back to lls .
EDU 7: we leverage a wide variety of language-independent forms from multiple data modalities ,
EDU 8: including image processing ( image-to-image retrieval , visual similarity and face recognition ) and sound matching .
EDU 9: we also propose novel methods
EDU 10: to learn entity priors from a large-scale hl corpus and knowledge base .
EDU 11: using hausa and chinese as the lls and english as the hl ,
EDU 12: experiments show
EDU 13: that our approach achieves 00.0 % higher hausa name tagging f-score over a costly supervised model ,
EDU 14: and 0.0 % higher chinese-to-english entity linking accuracy over state-of-the-art .
EDU 0:
EDU 1: when a large-scale incident or disaster occurs ,
EDU 2: there is often a great demand for rapidly developing a system
EDU 3: to extract detailed and new information from lowresource languages ( lls ) .
EDU 4: we propose a novel approach
EDU 5: to discover comparable documents in high-resource languages ( hls ) ,
EDU 6: and project entity discovery and linking results from hls documents back to lls .
EDU 7: we leverage a wide variety of language-independent forms from multiple data modalities ,
EDU 8: including image processing ( image-to-image retrieval , visual similarity and face recognition ) and sound matching .
EDU 9: we also propose novel methods
EDU 10: to learn entity priors from a large-scale hl corpus and knowledge base .
EDU 11: using hausa and chinese as the lls and english as the hl ,
EDU 12: experiments show
EDU 13: that our approach achieves 00.0 % higher hausa name tagging f-score over a costly supervised model ,
EDU 14: and 0.0 % higher chinese-to-english entity linking accuracy over state-of-the-art .
EDU 0:
EDU 1: we apply phrase-based and neural models to a core task in interactive machine translation :
EDU 2: suggesting
EDU 3: how to complete a partial translation .
EDU 4: for the phrase-based system , we demonstrate improvements in suggestion quality
EDU 5: using novel objective functions , learning techniques , and inference algorithms
EDU 6: tailored to this task .
EDU 7: our contributions include new tunable metrics , an improved beam search strategy , an n-best extraction method
EDU 8: that increases suggestion diversity ,
EDU 9: and a tuning procedure for a hierarchical joint model of alignment and translation .
EDU 10: the combination of these techniques improves next-word suggestion accuracy dramatically from 00.0 % to 00.0 % in a large-scale english-german experiment .
EDU 11: our recurrent neural translation system increases accuracy yet further to 00.0 % ,
EDU 12: but inference is two orders of magnitude slower .
EDU 13: manual error analysis shows the strengths and weaknesses of both approaches .
EDU 0:
EDU 1: we apply phrase-based and neural models to a core task in interactive machine translation :
EDU 2: suggesting
EDU 3: how to complete a partial translation .
EDU 4: for the phrase-based system , we demonstrate improvements in suggestion quality
EDU 5: using novel objective functions , learning techniques , and inference algorithms
EDU 6: tailored to this task .
EDU 7: our contributions include new tunable metrics , an improved beam search strategy , an n-best extraction method
EDU 8: that increases suggestion diversity ,
EDU 9: and a tuning procedure for a hierarchical joint model of alignment and translation .
EDU 10: the combination of these techniques improves next-word suggestion accuracy dramatically from 00.0 % to 00.0 % in a large-scale english-german experiment .
EDU 11: our recurrent neural translation system increases accuracy yet further to 00.0 % ,
EDU 12: but inference is two orders of magnitude slower .
EDU 13: manual error analysis shows the strengths and weaknesses of both approaches .
EDU 0:
EDU 1: attention mechanism has enhanced stateof-the-art neural machine translation ( nmt )
EDU 2: by jointly learning to align and translate .
EDU 3: it tends to ignore past alignment information ,
EDU 4: however , which often leads to over-translation and under-translation .
EDU 5: to address this problem ,
EDU 6: we propose coverage-based nmt in this paper .
EDU 7: we maintain a coverage vector
EDU 8: to keep track of the attention history .
EDU 9: the coverage vector is fed to the attention model
EDU 10: to help adjust future attention ,
EDU 11: which lets nmt system to consider more about untranslated source words .
EDU 12: experiments show
EDU 13: that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based nmt .
EDU 0:
EDU 1: attention mechanism has enhanced stateof-the-art neural machine translation ( nmt )
EDU 2: by jointly learning to align and translate .
EDU 3: it tends to ignore past alignment information ,
EDU 4: however , which often leads to over-translation and under-translation .
EDU 5: to address this problem ,
EDU 6: we propose coverage-based nmt in this paper .
EDU 7: we maintain a coverage vector
EDU 8: to keep track of the attention history .
EDU 9: the coverage vector is fed to the attention model
EDU 10: to help adjust future attention ,
EDU 11: which lets nmt system to consider more about untranslated source words .
EDU 12: experiments show
EDU 13: that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based nmt .
EDU 0:
EDU 1: neural machine translation ( nmt ) has obtained state-of-the art performance for several language pairs ,
EDU 2: while only using parallel data for training .
EDU 3: targetside monolingual data plays an important role
EDU 4: in boosting fluency for phrasebased statistical machine translation ,
EDU 5: and we investigate the use of monolingual data for nmt .
EDU 6: in contrast to previous work ,
EDU 7: which combines nmt models with separately trained language models ,
EDU 8: we note
EDU 9: that encoder-decoder nmt architectures already have the capacity
EDU 10: to learn the same information as a language model ,
EDU 11: and we explore strategies
EDU 12: to train with monolingual data
EDU 13: without changing the neural network architecture .
EDU 14: by pairing monolingual training data with an automatic backtranslation ,
EDU 15: we can treat it as additional parallel training data ,
EDU 16: and we obtain substantial improvements on the wmt 00 task english to german ( +0.0-0.0 bleu ) , and for the low-resourced iwslt 00 task turkish to english ( +0.0-0.0 bleu ) ,
EDU 17: obtaining new state-of-the-art results .
EDU 18: we also show
EDU 19: that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the iwslt 00 task english ! german .
EDU 0:
EDU 1: neural machine translation ( nmt ) has obtained state-of-the art performance for several language pairs ,
EDU 2: while only using parallel data for training .
EDU 3: targetside monolingual data plays an important role
EDU 4: in boosting fluency for phrasebased statistical machine translation ,
EDU 5: and we investigate the use of monolingual data for nmt .
EDU 6: in contrast to previous work ,
EDU 7: which combines nmt models with separately trained language models ,
EDU 8: we note
EDU 9: that encoder-decoder nmt architectures already have the capacity
EDU 10: to learn the same information as a language model ,
EDU 11: and we explore strategies
EDU 12: to train with monolingual data
EDU 13: without changing the neural network architecture .
EDU 14: by pairing monolingual training data with an automatic backtranslation ,
EDU 15: we can treat it as additional parallel training data ,
EDU 16: and we obtain substantial improvements on the wmt 00 task english to german ( +0.0-0.0 bleu ) , and for the low-resourced iwslt 00 task turkish to english ( +0.0-0.0 bleu ) ,
EDU 17: obtaining new state-of-the-art results .
EDU 18: we also show
EDU 19: that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the iwslt 00 task english ! german .
EDU 0:
EDU 1: one major drawback of phrase-based translation is that it segments an input sentence into continuous phrases .
EDU 2: to support linguistically informed source discontinuity ,
EDU 3: in this paper we construct graphs
EDU 4: which combine bigram and dependency relations
EDU 5: and propose a graph-based translation model .
EDU 6: the model segments an input graph into connected subgraphs ,
EDU 7: each of which may cover a discontinuous phrase .
EDU 8: we use beam search
EDU 9: to combine translations of each subgraph left-to-right
EDU 10: to produce a complete translation .
EDU 11: experiments on chinese-english and german-english tasks show
EDU 12: that our system is significantly better than the phrase-based model by up to +0.0/+0.0 bleu scores .
EDU 13: by explicitly modeling the graph segmentation ,
EDU 14: our system obtains further improvement , especially on german-english .
EDU 0:
EDU 1: one major drawback of phrase-based translation is that it segments an input sentence into continuous phrases .
EDU 2: to support linguistically informed source discontinuity ,
EDU 3: in this paper we construct graphs
EDU 4: which combine bigram and dependency relations
EDU 5: and propose a graph-based translation model .
EDU 6: the model segments an input graph into connected subgraphs ,
EDU 7: each of which may cover a discontinuous phrase .
EDU 8: we use beam search
EDU 9: to combine translations of each subgraph left-to-right
EDU 10: to produce a complete translation .
EDU 11: experiments on chinese-english and german-english tasks show
EDU 12: that our system is significantly better than the phrase-based model by up to +0.0/+0.0 bleu scores .
EDU 13: by explicitly modeling the graph segmentation ,
EDU 14: our system obtains further improvement , especially on german-english .
EDU 0:
EDU 1: as a new generation of cognitive robots start to enter our lives ,
EDU 2: it is important to enable robots to follow human commands
EDU 3: and to learn new actions from human language instructions .
EDU 4: to address this issue ,
EDU 5: this paper presents an approach
EDU 6: that explicitly represents verb semantics through hypothesis spaces of fluents
EDU 7: and automatically acquires these hypothesis spaces
EDU 8: by interacting with humans .
EDU 9: the learned hypothesis spaces can be used
EDU 10: to automatically plan for lower-level primitive actions towards physical world interaction .
EDU 11: our empirical results have shown
EDU 12: that the representation of a hypothesis space of fluents ,
EDU 13: combined with the learned hypothesis selection algorithm ,
EDU 14: outperforms a previous baseline .
EDU 15: in addition , our approach applies incremental learning ,
EDU 16: which can contribute to life-long learning from humans in the future .
EDU 0:
EDU 1: as a new generation of cognitive robots start to enter our lives ,
EDU 2: it is important to enable robots to follow human commands
EDU 3: and to learn new actions from human language instructions .
EDU 4: to address this issue ,
EDU 5: this paper presents an approach
EDU 6: that explicitly represents verb semantics through hypothesis spaces of fluents
EDU 7: and automatically acquires these hypothesis spaces
EDU 8: by interacting with humans .
EDU 9: the learned hypothesis spaces can be used
EDU 10: to automatically plan for lower-level primitive actions towards physical world interaction .
EDU 11: our empirical results have shown
EDU 12: that the representation of a hypothesis space of fluents ,
EDU 13: combined with the learned hypothesis selection algorithm ,
EDU 14: outperforms a previous baseline .
EDU 15: in addition , our approach applies incremental learning ,
EDU 16: which can contribute to life-long learning from humans in the future .
EDU 0:
EDU 1: we propose a framework for lexical substitution
EDU 2: that is able to perform transfer learning across languages .
EDU 3: datasets for this task are available in at least three languages
EDU 4: ( english , italian , and german ) .
EDU 5: previous work has addressed each of these tasks in isolation .
EDU 6: in contrast , we regard the union of three shared tasks as a combined multilingual dataset .
EDU 7: we show
EDU 8: that a supervised system can be trained effectively ,
EDU 9: even if training and evaluation data are from different languages .
EDU 10: successful transfer learning between languages suggests
EDU 11: that the learned model is in fact independent of the underlying language .
EDU 12: we combine state-of-the-art unsupervised features
EDU 13: obtained from syntactic word embeddings and distributional thesauri
EDU 14: in a supervised delexicalized ranking system .
EDU 15: our system improves over state of the art in the full lexical substitution task in all three languages .
EDU 0:
EDU 1: we propose a framework for lexical substitution
EDU 2: that is able to perform transfer learning across languages .
EDU 3: datasets for this task are available in at least three languages
EDU 4: ( english , italian , and german ) .
EDU 5: previous work has addressed each of these tasks in isolation .
EDU 6: in contrast , we regard the union of three shared tasks as a combined multilingual dataset .
EDU 7: we show
EDU 8: that a supervised system can be trained effectively ,
EDU 9: even if training and evaluation data are from different languages .
EDU 10: successful transfer learning between languages suggests
EDU 11: that the learned model is in fact independent of the underlying language .
EDU 12: we combine state-of-the-art unsupervised features
EDU 13: obtained from syntactic word embeddings and distributional thesauri
EDU 14: in a supervised delexicalized ranking system .
EDU 15: our system improves over state of the art in the full lexical substitution task in all three languages .
EDU 0:
EDU 1: we use bayesian optimization
EDU 2: to learn curricula for word representation learning ,
EDU 3: optimizing performance on downstream tasks
EDU 4: that depend on the learned representations as features .
EDU 5: the curricula are modeled by a linear ranking function
EDU 6: which is the scalar product of a learned weight vector and an engineered feature vector
EDU 7: that characterizes the different aspects of the complexity of each instance in the training corpus .
EDU 8: we show
EDU 9: that learning the curriculum improves performance on a variety of downstream tasks
EDU 10: over random orders
EDU 11: and in comparison to the natural corpus order .
EDU 0:
EDU 1: we use bayesian optimization
EDU 2: to learn curricula for word representation learning ,
EDU 3: optimizing performance on downstream tasks
EDU 4: that depend on the learned representations as features .
EDU 5: the curricula are modeled by a linear ranking function
EDU 6: which is the scalar product of a learned weight vector and an engineered feature vector
EDU 7: that characterizes the different aspects of the complexity of each instance in the training corpus .
EDU 8: we show
EDU 9: that learning the curriculum improves performance on a variety of downstream tasks
EDU 10: over random orders
EDU 11: and in comparison to the natural corpus order .
EDU 0:
EDU 1: the problem of rare and unknown words is an important issue
EDU 2: that can potentially effect the performance of many nlp systems ,
EDU 3: including traditional count-based and deep learning models .
EDU 4: we propose a novel way
EDU 5: to deal with the rare and unseen words for the neural network models
EDU 6: using attention .
EDU 7: our model uses two softmax layers
EDU 8: in order to predict the next word in conditional language models :
EDU 9: one predicts the location of a word in the source sentence ,
EDU 10: and the other predicts a word in the shortlist vocabulary .
EDU 11: at each timestep , the decision
EDU 12: of which softmax layer to use
EDU 13: is adaptively made by an mlp
EDU 14: which is conditioned on the context .
EDU 15: we motivate this work from a psychological evidence
EDU 16: that humans naturally have a tendency
EDU 17: to point towards objects in the context or the environment
EDU 18: when the name of an object is not known .
EDU 19: using our proposed model ,
EDU 20: we observe improvements on two tasks ,
EDU 21: neural machine translation on the europarl english to french parallel corpora and text summarization on the gigaword dataset .
EDU 0:
EDU 1: the problem of rare and unknown words is an important issue
EDU 2: that can potentially effect the performance of many nlp systems ,
EDU 3: including traditional count-based and deep learning models .
EDU 4: we propose a novel way
EDU 5: to deal with the rare and unseen words for the neural network models
EDU 6: using attention .
EDU 7: our model uses two softmax layers
EDU 8: in order to predict the next word in conditional language models :
EDU 9: one predicts the location of a word in the source sentence ,
EDU 10: and the other predicts a word in the shortlist vocabulary .
EDU 11: at each timestep , the decision
EDU 12: of which softmax layer to use
EDU 13: is adaptively made by an mlp
EDU 14: which is conditioned on the context .
EDU 15: we motivate this work from a psychological evidence
EDU 16: that humans naturally have a tendency
EDU 17: to point towards objects in the context or the environment
EDU 18: when the name of an object is not known .
EDU 19: using our proposed model ,
EDU 20: we observe improvements on two tasks ,
EDU 21: neural machine translation on the europarl english to french parallel corpora and text summarization on the gigaword dataset .
EDU 0:
EDU 1: in this paper , we present a generalized transition-based parsing framework
EDU 2: where parsers are instantiated in terms of a set of control parameters
EDU 3: that constrain transitions between parser states .
EDU 4: this generalization provides a unified framework
EDU 5: to describe and compare various transitionbased parsing approaches from both a theoretical and empirical perspective .
EDU 6: this includes well-known transition systems ,
EDU 7: but also previously unstudied systems .
EDU 0:
EDU 1: in this paper , we present a generalized transition-based parsing framework
EDU 2: where parsers are instantiated in terms of a set of control parameters
EDU 3: that constrain transitions between parser states .
EDU 4: this generalization provides a unified framework
EDU 5: to describe and compare various transitionbased parsing approaches from both a theoretical and empirical perspective .
EDU 6: this includes well-known transition systems ,
EDU 7: but also previously unstudied systems .
EDU 0:
EDU 1: we present a transition-based system
EDU 2: that jointly predicts the syntactic structure and lexical units of a sentence
EDU 3: by building two structures over the input words :
EDU 4: a syntactic dependency tree and a forest of lexical units
EDU 5: including multiword expressions ( mwes ) .
EDU 6: this combined representation allows us to capture both the syntactic and semantic structure of mwes ,
EDU 7: which in turn enables deeper downstream semantic analysis ,
EDU 8: especially for semicompositional mwes .
EDU 9: the proposed system extends the arc-standard transition system for dependency parsing with transitions
EDU 10: for building complex lexical units .
EDU 11: experiments on two different data sets show
EDU 12: that the approach significantly improves mwe identification accuracy ( and sometimes syntactic accuracy )
EDU 13: compared to existing joint approaches .
EDU 0:
EDU 1: we present a transition-based system
EDU 2: that jointly predicts the syntactic structure and lexical units of a sentence
EDU 3: by building two structures over the input words :
EDU 4: a syntactic dependency tree and a forest of lexical units
EDU 5: including multiword expressions ( mwes ) .
EDU 6: this combined representation allows us to capture both the syntactic and semantic structure of mwes ,
EDU 7: which in turn enables deeper downstream semantic analysis ,
EDU 8: especially for semicompositional mwes .
EDU 9: the proposed system extends the arc-standard transition system for dependency parsing with transitions
EDU 10: for building complex lexical units .
EDU 11: experiments on two different data sets show
EDU 12: that the approach significantly improves mwe identification accuracy ( and sometimes syntactic accuracy )
EDU 13: compared to existing joint approaches .
EDU 0:
EDU 1: dynamic oracle training has shown substantial improvements for dependency parsing in various settings ,
EDU 2: but has not been explored for constituent parsing .
EDU 3: the present article introduces a dynamic oracle for transition-based constituent parsing .
EDU 4: experiments on the 0 languages of the spmrl dataset show
EDU 5: that a neural greedy parser with morphological features ,
EDU 6: trained with a dynamic oracle ,
EDU 7: leads to accuracies
EDU 8: comparable with the best non-reranking and non-ensemble parsers .
EDU 0:
EDU 1: dynamic oracle training has shown substantial improvements for dependency parsing in various settings ,
EDU 2: but has not been explored for constituent parsing .
EDU 3: the present article introduces a dynamic oracle for transition-based constituent parsing .
EDU 4: experiments on the 0 languages of the spmrl dataset show
EDU 5: that a neural greedy parser with morphological features ,
EDU 6: trained with a dynamic oracle ,
EDU 7: leads to accuracies
EDU 8: comparable with the best non-reranking and non-ensemble parsers .
EDU 0:
EDU 1: metaphorical expressions are pervasive in natural language
EDU 2: and pose a substantial challenge for computational semantics .
EDU 3: the inherent compositionality of metaphor makes it an important test case for compositional distributional semantic models ( cdsms ) .
EDU 4: this paper is the first
EDU 5: to investigate
EDU 6: whether metaphorical composition warrants a distinct treatment in the cdsm framework .
EDU 7: we propose a method
EDU 8: to learn metaphors as linear transformations in a vector space
EDU 9: and find
EDU 10: that , across a variety of semantic domains , explicitly modeling metaphor improves the resulting semantic representations .
EDU 11: we then use these representations in a metaphor identification task ,
EDU 12: achieving a high performance of 0.00 in terms of f-score .
EDU 0:
EDU 1: metaphorical expressions are pervasive in natural language
EDU 2: and pose a substantial challenge for computational semantics .
EDU 3: the inherent compositionality of metaphor makes it an important test case for compositional distributional semantic models ( cdsms ) .
EDU 4: this paper is the first
EDU 5: to investigate
EDU 6: whether metaphorical composition warrants a distinct treatment in the cdsm framework .
EDU 7: we propose a method
EDU 8: to learn metaphors as linear transformations in a vector space
EDU 9: and find
EDU 10: that , across a variety of semantic domains , explicitly modeling metaphor improves the resulting semantic representations .
EDU 11: we then use these representations in a metaphor identification task ,
EDU 12: achieving a high performance of 0.00 in terms of f-score .
EDU 0:
EDU 1: idiom token classification is the task of deciding for a set of potentially idiomatic phrases
EDU 2: whether each occurrence of a phrase is a literal or idiomatic usage of the phrase .
EDU 3: in this work we explore the use of skip-thought vectors
EDU 4: to create distributed representations
EDU 5: that encode features
EDU 6: that are predictive with respect to idiom token classification .
EDU 7: we show
EDU 8: that classifiers
EDU 9: using these representations
EDU 10: have competitive performance
EDU 11: compared with the state of the art in idiom token classification .
EDU 12: importantly , however , our models use only the sentence
EDU 13: containing the target phrase as input
EDU 14: and are thus less dependent on a potentially inaccurate or incomplete model of discourse context .
EDU 15: we further demonstrate the feasibility of using these representations
EDU 16: to train a competitive general idiom token classifier .
EDU 0:
EDU 1: idiom token classification is the task of deciding for a set of potentially idiomatic phrases
EDU 2: whether each occurrence of a phrase is a literal or idiomatic usage of the phrase .
EDU 3: in this work we explore the use of skip-thought vectors
EDU 4: to create distributed representations
EDU 5: that encode features
EDU 6: that are predictive with respect to idiom token classification .
EDU 7: we show
EDU 8: that classifiers
EDU 9: using these representations
EDU 10: have competitive performance
EDU 11: compared with the state of the art in idiom token classification .
EDU 12: importantly , however , our models use only the sentence
EDU 13: containing the target phrase as input
EDU 14: and are thus less dependent on a potentially inaccurate or incomplete model of discourse context .
EDU 15: we further demonstrate the feasibility of using these representations
EDU 16: to train a competitive general idiom token classifier .
EDU 0:
EDU 1: we present a novel method
EDU 2: for jointly learning compositional and noncompositional phrase embeddings
EDU 3: by adaptively weighting both types of embeddings
EDU 4: using a compositionality scoring function .
EDU 5: the scoring function is used
EDU 6: to quantify the level of compositionality of each phrase ,
EDU 7: and the parameters of the function are jointly optimized with the objective
EDU 8: for learning phrase embeddings .
EDU 9: in experiments , we apply the adaptive joint learning method to the task
EDU 10: of learning embeddings of transitive verb phrases ,
EDU 11: and show
EDU 12: that the compositionality scores have strong correlation with human ratings for verb-object compositionality ,
EDU 13: substantially outperforming the previous state of the art .
EDU 14: moreover , our embeddings improve upon the previous best model on a transitive verb disambiguation task .
EDU 15: we also show
EDU 16: that a simple ensemble technique further improves the results for both tasks .
EDU 0:
EDU 1: we present a novel method
EDU 2: for jointly learning compositional and noncompositional phrase embeddings
EDU 3: by adaptively weighting both types of embeddings
EDU 4: using a compositionality scoring function .
EDU 5: the scoring function is used
EDU 6: to quantify the level of compositionality of each phrase ,
EDU 7: and the parameters of the function are jointly optimized with the objective
EDU 8: for learning phrase embeddings .
EDU 9: in experiments , we apply the adaptive joint learning method to the task
EDU 10: of learning embeddings of transitive verb phrases ,
EDU 11: and show
EDU 12: that the compositionality scores have strong correlation with human ratings for verb-object compositionality ,
EDU 13: substantially outperforming the previous state of the art .
EDU 14: moreover , our embeddings improve upon the previous best model on a transitive verb disambiguation task .
EDU 15: we also show
EDU 16: that a simple ensemble technique further improves the results for both tasks .
EDU 0:
EDU 1: metaphor is a common linguistic tool in communication ,
EDU 2: making its detection in discourse a crucial task for natural language understanding .
EDU 3: one popular approach to this challenge is to capture semantic incohesion between a metaphor and the dominant topic of the surrounding text .
EDU 4: while these methods are effective ,
EDU 5: they tend to overclassify target words as metaphorical
EDU 6: when they deviate in meaning from its context .
EDU 7: we present a new approach
EDU 8: that ( 0 ) distinguishes literal and non-literal use of target words
EDU 9: by examining sentence-level topic transitions
EDU 10: and ( 0 ) captures the motivation of speakers
EDU 11: to express emotions and abstract concepts metaphorically .
EDU 12: experiments on an online breast cancer discussion forum dataset demonstrate a significant improvement in metaphor detection over the state-of-theart .
EDU 13: these experimental results also reveal a tendency toward metaphor usage in personal topics and certain emotional contexts .
EDU 0:
EDU 1: metaphor is a common linguistic tool in communication ,
EDU 2: making its detection in discourse a crucial task for natural language understanding .
EDU 3: one popular approach to this challenge is to capture semantic incohesion between a metaphor and the dominant topic of the surrounding text .
EDU 4: while these methods are effective ,
EDU 5: they tend to overclassify target words as metaphorical
EDU 6: when they deviate in meaning from its context .
EDU 7: we present a new approach
EDU 8: that ( 0 ) distinguishes literal and non-literal use of target words
EDU 9: by examining sentence-level topic transitions
EDU 10: and ( 0 ) captures the motivation of speakers
EDU 11: to express emotions and abstract concepts metaphorically .
EDU 12: experiments on an online breast cancer discussion forum dataset demonstrate a significant improvement in metaphor detection over the state-of-theart .
EDU 13: these experimental results also reveal a tendency toward metaphor usage in personal topics and certain emotional contexts .
EDU 0:
EDU 1: neural networks are among the state-ofthe-art techniques for language modeling .
EDU 2: existing neural language models typically map discrete words to distributed , dense vector representations .
EDU 3: after information processing of the preceding context words by hidden layers ,
EDU 4: an output layer estimates the probability of the next word .
EDU 5: such approaches are time- and memory-intensive
EDU 6: because of the large numbers of parameters for word embeddings and the output layer .
EDU 7: in this paper , we propose to compress neural language models by sparse word representations .
EDU 8: in the experiments , the number of parameters in our model increases very slowly with the growth of the vocabulary size ,
EDU 9: which is almost imperceptible .
EDU 10: moreover , our approach not only reduces the parameter space to a large extent ,
EDU 11: but also improves the performance in terms of the perplexity measure .
EDU 0:
EDU 1: neural networks are among the state-ofthe-art techniques for language modeling .
EDU 2: existing neural language models typically map discrete words to distributed , dense vector representations .
EDU 3: after information processing of the preceding context words by hidden layers ,
EDU 4: an output layer estimates the probability of the next word .
EDU 5: such approaches are time- and memory-intensive
EDU 6: because of the large numbers of parameters for word embeddings and the output layer .
EDU 7: in this paper , we propose to compress neural language models by sparse word representations .
EDU 8: in the experiments , the number of parameters in our model increases very slowly with the growth of the vocabulary size ,
EDU 9: which is almost imperceptible .
EDU 10: moreover , our approach not only reduces the parameter space to a large extent ,
EDU 11: but also improves the performance in terms of the perplexity measure .
EDU 0:
EDU 1: we introduce a new methodology for intrinsic evaluation of word representations .
EDU 2: specifically , we identify four fundamental criteria
EDU 3: based on the characteristics of natural language
EDU 4: that pose difficulties to nlp systems ;
EDU 5: and develop tests
EDU 6: that directly show
EDU 7: whether or not representations contain the subspaces necessary
EDU 8: to satisfy these criteria .
EDU 9: current intrinsic evaluations are mostly based on the overall similarity or full-space similarity of words
EDU 10: and thus view vector representations as points .
EDU 11: we show the limits of these point-based intrinsic evaluations .
EDU 12: we apply our evaluation methodology to the comparison of a count vector model and several neural network models
EDU 13: and demonstrate important properties of these models .
EDU 0:
EDU 1: we introduce a new methodology for intrinsic evaluation of word representations .
EDU 2: specifically , we identify four fundamental criteria
EDU 3: based on the characteristics of natural language
EDU 4: that pose difficulties to nlp systems ;
EDU 5: and develop tests
EDU 6: that directly show
EDU 7: whether or not representations contain the subspaces necessary
EDU 8: to satisfy these criteria .
EDU 9: current intrinsic evaluations are mostly based on the overall similarity or full-space similarity of words
EDU 10: and thus view vector representations as points .
EDU 11: we show the limits of these point-based intrinsic evaluations .
EDU 12: we apply our evaluation methodology to the comparison of a count vector model and several neural network models
EDU 13: and demonstrate important properties of these models .
EDU 0:
EDU 1: a shared bilingual word embedding space ( sbwes ) is an indispensable resource in a variety of cross-language nlp and ir tasks .
EDU 2: a common approach to the sbwes induction is to learn a mapping function between monolingual semantic spaces ,
EDU 3: where the mapping critically relies on a seed word lexicon
EDU 4: used in the learning process .
EDU 5: in this work , we analyze the importance and properties of seed lexicons for the sbwes induction across different dimensions
EDU 6: ( i.e. , lexicon source , lexicon size , translation method , translation pair reliability ) .
EDU 7: on the basis of our analysis ,
EDU 8: we propose a simple but effective hybrid bilingual word embedding ( bwe ) model .
EDU 9: this model ( hybwe ) learns the mapping between two monolingual embedding spaces
EDU 10: using only highly reliable symmetric translation pairs from a seed document-level embedding space .
EDU 11: we perform bilingual lexicon learning ( bll ) with 0 language pairs
EDU 12: and show
EDU 13: that by carefully selecting reliable translation pairs
EDU 14: our new hybwe model outperforms benchmarking bwe learning models ,
EDU 15: all of which use more expensive bilingual signals .
EDU 16: effectively , we demonstrate
EDU 17: that a sbwes may be induced
EDU 18: by leveraging only a very weak bilingual signal ( document alignments ) along with monolingual data .
EDU 0:
EDU 1: a shared bilingual word embedding space ( sbwes ) is an indispensable resource in a variety of cross-language nlp and ir tasks .
EDU 2: a common approach to the sbwes induction is to learn a mapping function between monolingual semantic spaces ,
EDU 3: where the mapping critically relies on a seed word lexicon
EDU 4: used in the learning process .
EDU 5: in this work , we analyze the importance and properties of seed lexicons for the sbwes induction across different dimensions
EDU 6: ( i.e. , lexicon source , lexicon size , translation method , translation pair reliability ) .
EDU 7: on the basis of our analysis ,
EDU 8: we propose a simple but effective hybrid bilingual word embedding ( bwe ) model .
EDU 9: this model ( hybwe ) learns the mapping between two monolingual embedding spaces
EDU 10: using only highly reliable symmetric translation pairs from a seed document-level embedding space .
EDU 11: we perform bilingual lexicon learning ( bll ) with 0 language pairs
EDU 12: and show
EDU 13: that by carefully selecting reliable translation pairs
EDU 14: our new hybwe model outperforms benchmarking bwe learning models ,
EDU 15: all of which use more expensive bilingual signals .
EDU 16: effectively , we demonstrate
EDU 17: that a sbwes may be induced
EDU 18: by leveraging only a very weak bilingual signal ( document alignments ) along with monolingual data .
EDU 0:
EDU 1: we propose a brand new `` liberal '' event extraction paradigm
EDU 2: to extract events and discover event schemas from any input corpus simultaneously .
EDU 3: we incorporate symbolic
EDU 4: ( e.g. , abstract meaning representation )
EDU 5: and distributional semantics
EDU 6: to detect and represent event structures
EDU 7: and adopt a joint typing framework
EDU 8: to simultaneously extract event types and argument roles
EDU 9: and discover an event schema .
EDU 10: experiments on general and specific domains demonstrate
EDU 11: that this framework can construct high-quality schemas with many event and argument role types ,
EDU 12: covering a high proportion of event types and argument roles in manually defined schemas .
EDU 13: we show
EDU 14: that extraction performance
EDU 15: using discovered schemas
EDU 16: is comparable to supervised models
EDU 17: trained from a large amount of data
EDU 18: labeled according to predefined event types .
EDU 19: the extraction quality of new event types is also promising .
EDU 0:
EDU 1: we propose a brand new `` liberal '' event extraction paradigm
EDU 2: to extract events and discover event schemas from any input corpus simultaneously .
EDU 3: we incorporate symbolic
EDU 4: ( e.g. , abstract meaning representation )
EDU 5: and distributional semantics
EDU 6: to detect and represent event structures
EDU 7: and adopt a joint typing framework
EDU 8: to simultaneously extract event types and argument roles
EDU 9: and discover an event schema .
EDU 10: experiments on general and specific domains demonstrate
EDU 11: that this framework can construct high-quality schemas with many event and argument role types ,
EDU 12: covering a high proportion of event types and argument roles in manually defined schemas .
EDU 13: we show
EDU 14: that extraction performance
EDU 15: using discovered schemas
EDU 16: is comparable to supervised models
EDU 17: trained from a large amount of data
EDU 18: labeled according to predefined event types .
EDU 19: the extraction quality of new event types is also promising .
EDU 0:
EDU 1: event extraction from texts aims to detect structured information
EDU 2: such as what has happened , to whom , where and when .
EDU 3: event extraction and visualization are typically considered as two different tasks .
EDU 4: in this paper , we propose a novel approach
EDU 5: based on probabilistic modelling
EDU 6: to jointly extract and visualize events from tweets
EDU 7: where both tasks benefit from each other .
EDU 8: we model each event as a joint distribution over named entities , a date , a location and event-related keywords .
EDU 9: moreover , both tweets and event instances are associated with coordinates in the visualization space .
EDU 10: the manifold assumption
EDU 11: that the intrinsic geometry of tweets is a low-rank , non-linear manifold within the high-dimensional space
EDU 12: is incorporated into the learning framework
EDU 13: using a regularization .
EDU 14: experimental results show
EDU 15: that the proposed approach can effectively deal with both event extraction and visualization
EDU 16: and performs remarkably better than both the state-of-the-art event extraction method and a pipeline approach for event extraction and visualization .
EDU 0:
EDU 1: event extraction from texts aims to detect structured information
EDU 2: such as what has happened , to whom , where and when .
EDU 3: event extraction and visualization are typically considered as two different tasks .
EDU 4: in this paper , we propose a novel approach
EDU 5: based on probabilistic modelling
EDU 6: to jointly extract and visualize events from tweets
EDU 7: where both tasks benefit from each other .
EDU 8: we model each event as a joint distribution over named entities , a date , a location and event-related keywords .
EDU 9: moreover , both tweets and event instances are associated with coordinates in the visualization space .
EDU 10: the manifold assumption
EDU 11: that the intrinsic geometry of tweets is a low-rank , non-linear manifold within the high-dimensional space
EDU 12: is incorporated into the learning framework
EDU 13: using a regularization .
EDU 14: experimental results show
EDU 15: that the proposed approach can effectively deal with both event extraction and visualization
EDU 16: and performs remarkably better than both the state-of-the-art event extraction method and a pipeline approach for event extraction and visualization .
EDU 0:
EDU 1: there is a small but growing body of research on statistical scripts , models of event sequences
EDU 2: that allow probabilistic inference of implicit events from documents .
EDU 3: these systems operate on structured verb-argument events
EDU 4: produced by an nlp pipeline .
EDU 5: we compare these systems with recent recurrent neural net models
EDU 6: that directly operate on raw tokens
EDU 7: to predict sentences ,
EDU 8: finding the latter to be roughly comparable to the former
EDU 9: in terms of predicting missing events in documents .
EDU 0:
EDU 1: there is a small but growing body of research on statistical scripts , models of event sequences
EDU 2: that allow probabilistic inference of implicit events from documents .
EDU 3: these systems operate on structured verb-argument events
EDU 4: produced by an nlp pipeline .
EDU 5: we compare these systems with recent recurrent neural net models
EDU 6: that directly operate on raw tokens
EDU 7: to predict sentences ,
EDU 8: finding the latter to be roughly comparable to the former
EDU 9: in terms of predicting missing events in documents .
EDU 0:
EDU 1: natural language understanding often requires deep semantic knowledge .
EDU 2: expanding on previous proposals ,
EDU 3: we suggest
EDU 4: that some important aspects of semantic knowledge can be modeled as a language model
EDU 5: if done at an appropriate level of abstraction .
EDU 6: we develop two distinct models
EDU 7: that capture semantic frame chains and discourse information
EDU 8: while abstracting over the specific mentions of predicates and entities .
EDU 9: for each model , we investigate four implementations :
EDU 10: a `` standard '' n-gram language model and three discriminatively trained `` neural '' language models
EDU 11: that generate embeddings for semantic frames .
EDU 12: the quality of the semantic language models ( semlm ) is evaluated both intrinsically ,
EDU 13: using perplexity and a narrative cloze test
EDU 14: and extrinsically -
EDU 15: we show
EDU 16: that our semlm helps improve performance on semantic natural language processing tasks
EDU 17: such as co-reference resolution and discourse parsing .
EDU 0:
EDU 1: natural language understanding often requires deep semantic knowledge .
EDU 2: expanding on previous proposals ,
EDU 3: we suggest
EDU 4: that some important aspects of semantic knowledge can be modeled as a language model
EDU 5: if done at an appropriate level of abstraction .
EDU 6: we develop two distinct models
EDU 7: that capture semantic frame chains and discourse information
EDU 8: while abstracting over the specific mentions of predicates and entities .
EDU 9: for each model , we investigate four implementations :
EDU 10: a `` standard '' n-gram language model and three discriminatively trained `` neural '' language models
EDU 11: that generate embeddings for semantic frames .
EDU 12: the quality of the semantic language models ( semlm ) is evaluated both intrinsically ,
EDU 13: using perplexity and a narrative cloze test
EDU 14: and extrinsically -
EDU 15: we show
EDU 16: that our semlm helps improve performance on semantic natural language processing tasks
EDU 17: such as co-reference resolution and discourse parsing .
EDU 0:
EDU 1: domain adaptation is an important research topic in sentiment analysis area .
EDU 2: existing domain adaptation methods usually transfer sentiment knowledge from only one source domain to target domain .
EDU 3: in this paper , we propose a new domain adaptation approach
EDU 4: which can exploit sentiment knowledge from multiple source domains .
EDU 5: we first extract both global and domain-specific sentiment knowledge from the data of multiple source domains
EDU 6: using multi-task learning .
EDU 7: then we transfer them to target domain
EDU 8: with the help of words ' sentiment polarity relations
EDU 9: extracted from the unlabeled target domain data .
EDU 10: the similarities between target domain and different source domains are also incorporated into the adaptation process .
EDU 11: experimental results on benchmark dataset show the effectiveness of our approach
EDU 12: in improving cross-domain sentiment classification performance .
EDU 0:
EDU 1: domain adaptation is an important research topic in sentiment analysis area .
EDU 2: existing domain adaptation methods usually transfer sentiment knowledge from only one source domain to target domain .
EDU 3: in this paper , we propose a new domain adaptation approach
EDU 4: which can exploit sentiment knowledge from multiple source domains .
EDU 5: we first extract both global and domain-specific sentiment knowledge from the data of multiple source domains
EDU 6: using multi-task learning .
EDU 7: then we transfer them to target domain
EDU 8: with the help of words ' sentiment polarity relations
EDU 9: extracted from the unlabeled target domain data .
EDU 10: the similarities between target domain and different source domains are also incorporated into the adaptation process .
EDU 11: experimental results on benchmark dataset show the effectiveness of our approach
EDU 12: in improving cross-domain sentiment classification performance .
EDU 0:
EDU 1: through a particular choice of a predicate
EDU 2: ( e.g. , ��x violated y�� ) ,
EDU 3: a writer can subtly connote a range of implied sentiment and presupposed facts about the entities x and y
EDU 4: : ( 0 ) writer��s perspective :
EDU 5: projecting x as an ��antagonist�� and y as a ��victim�� ,
EDU 6: ( 0 ) entities�� perspective : y probably dislikes x ,
EDU 7: ( 0 ) effect : something bad happened to y ,
EDU 8: ( 0 ) value : y is something valuable ,
EDU 9: and ( 0 ) mental state : y is distressed by the event .
EDU 10: we introduce connotation frames as a representation formalism
EDU 11: to organize these rich dimensions of connotation
EDU 12: using typed relations .
EDU 13: first , we investigate the feasibility
EDU 14: of obtaining connotative labels through crowdsourcing experiments .
EDU 15: we then present models
EDU 16: for predicting the connotation frames of verb predicates
EDU 17: based on their distributional word representations and the interplay between different types of connotative relations .
EDU 18: empirical results confirm
EDU 19: that connotation frames can be induced from various data sources
EDU 20: that reflect
EDU 21: how language is used in context .
EDU 22: we conclude with analytical results
EDU 23: that show the potential use of connotation frames
EDU 24: for analyzing subtle biases in online news media .
EDU 0:
EDU 1: sentiment classification aims to automatically predict sentiment polarity
EDU 2: ( e.g. , positive or negative )
EDU 3: of user generated sentiment data ( e.g. , reviews , blogs ) .
EDU 4: due to the mismatch among different domains ,
EDU 5: a sentiment classifier
EDU 6: trained in one domain
EDU 7: may not work well
EDU 8: when directly applied to other domains .
EDU 9: thus , domain adaptation for sentiment classification algorithms are highly desirable to reduce the domain discrepancy and manual labeling costs .
EDU 10: to address the above challenge ,
EDU 11: we propose a novel domain adaptation method ,
EDU 12: called bi-transferring deep neural networks ( btdnns ) .
EDU 13: the proposed btdnns attempts to transfer the source domain examples to the target domain ,
EDU 14: and also transfer the target domain examples to the source domain .
EDU 15: the linear transformation of btdnns ensures the feasibility of transferring between domains ,
EDU 16: and the distribution consistency between the transferred domain and the desirable domain is constrained with a linear data reconstruction manner .
EDU 17: as a result , the transferred source domain is supervised
EDU 18: and follows similar distribution as the target domain .
EDU 19: therefore , any supervised method can be used on the transferred source domain
EDU 20: to train a classifier for sentiment classification in a target domain .
EDU 21: we conduct experiments on a benchmark
EDU 22: composed of reviews of 0 types of amazon products .
EDU 23: experimental results show
EDU 24: that our proposed approach significantly outperforms the several baseline methods ,
EDU 25: and achieves an accuracy
EDU 26: which is competitive with the state-of-the-art method for domain adaptation .
EDU 0:
EDU 1: sentiment classification aims to automatically predict sentiment polarity
EDU 2: ( e.g. , positive or negative )
EDU 3: of user generated sentiment data ( e.g. , reviews , blogs ) .
EDU 4: due to the mismatch among different domains ,
EDU 5: a sentiment classifier
EDU 6: trained in one domain
EDU 7: may not work well
EDU 8: when directly applied to other domains .
EDU 9: thus , domain adaptation for sentiment classification algorithms are highly desirable to reduce the domain discrepancy and manual labeling costs .
EDU 10: to address the above challenge ,
EDU 11: we propose a novel domain adaptation method ,
EDU 12: called bi-transferring deep neural networks ( btdnns ) .
EDU 13: the proposed btdnns attempts to transfer the source domain examples to the target domain ,
EDU 14: and also transfer the target domain examples to the source domain .
EDU 15: the linear transformation of btdnns ensures the feasibility of transferring between domains ,
EDU 16: and the distribution consistency between the transferred domain and the desirable domain is constrained with a linear data reconstruction manner .
EDU 17: as a result , the transferred source domain is supervised
EDU 18: and follows similar distribution as the target domain .
EDU 19: therefore , any supervised method can be used on the transferred source domain
EDU 20: to train a classifier for sentiment classification in a target domain .
EDU 21: we conduct experiments on a benchmark
EDU 22: composed of reviews of 0 types of amazon products .
EDU 23: experimental results show
EDU 24: that our proposed approach significantly outperforms the several baseline methods ,
EDU 25: and achieves an accuracy
EDU 26: which is competitive with the state-of-the-art method for domain adaptation .
EDU 0:
EDU 1: we present a new approach for documentlevel sentiment inference ,
EDU 2: where the goal is to predict directed opinions
EDU 3: ( who feels positively or negatively towards whom )
EDU 4: for all entities
EDU 5: mentioned in a text .
EDU 6: to encourage more complete and consistent predictions ,
EDU 7: we introduce an ilp
EDU 8: that jointly models
EDU 9: ( 0 ) sentence- and discourse-level sentiment cues ,
EDU 10: ( 0 ) factual evidence about entity factions , and
EDU 11: ( 0 ) global constraints
EDU 12: based on social science theories
EDU 13: such as homophily , social balance , and reciprocity .
EDU 14: together , these cues allow for rich inference across groups of entities ,
EDU 15: including for example that ceos and the companies
EDU 16: they lead
EDU 17: are likely to have similar sentiment towards others .
EDU 18: we evaluate performance on new , densely labeled data
EDU 19: that provides supervision for all pairs ,
EDU 20: complementing previous work
EDU 21: that only labeled pairs
EDU 22: mentioned in the same sentence .
EDU 23: experiments demonstrate
EDU 24: that the global model outperforms sentence-level baselines ,
EDU 25: by providing more coherent predictions across sets of related entities .
EDU 0:
EDU 1: we present a new approach for documentlevel sentiment inference ,
EDU 2: where the goal is to predict directed opinions
EDU 3: ( who feels positively or negatively towards whom )
EDU 4: for all entities
EDU 5: mentioned in a text .
EDU 6: to encourage more complete and consistent predictions ,
EDU 7: we introduce an ilp
EDU 8: that jointly models
EDU 9: ( 0 ) sentence- and discourse-level sentiment cues ,
EDU 10: ( 0 ) factual evidence about entity factions , and
EDU 11: ( 0 ) global constraints
EDU 12: based on social science theories
EDU 13: such as homophily , social balance , and reciprocity .
EDU 14: together , these cues allow for rich inference across groups of entities ,
EDU 15: including for example that ceos and the companies
EDU 16: they lead
EDU 17: are likely to have similar sentiment towards others .
EDU 18: we evaluate performance on new , densely labeled data
EDU 19: that provides supervision for all pairs ,
EDU 20: complementing previous work
EDU 21: that only labeled pairs
EDU 22: mentioned in the same sentence .
EDU 23: experiments demonstrate
EDU 24: that the global model outperforms sentence-level baselines ,
EDU 25: by providing more coherent predictions across sets of related entities .
EDU 0:
EDU 1: different from traditional active learning
EDU 2: based on sentence-wise full annotation ( fa ) ,
EDU 3: this paper proposes active learning with dependency-wise partial annotation ( pa ) as a finer-grained unit for dependency parsing .
EDU 4: at each iteration , we select a few most uncertain words from an unlabeled data pool ,
EDU 5: manually annotate their syntactic heads ,
EDU 6: and add the partial trees into labeled data for parser retraining .
EDU 7: compared with sentence-wise fa ,
EDU 8: dependency-wise pa gives us more flexibility in task selection
EDU 9: and avoids wasting time on annotating trivial tasks in a sentence .
EDU 10: our work makes the following contributions .
EDU 11: first , we are the first to apply a probabilistic model to active learning for dependency parsing ,
EDU 12: which can 0 ) provide tree probabilities and dependency marginal probabilities as principled uncertainty metrics ,
EDU 13: and 0 ) directly learn parameters from pa
EDU 14: based on a forest-based training objective .
EDU 15: second , we propose and compare several uncertainty metrics through simulation experiments on both chinese and english .
EDU 16: finally , we conduct human annotation experiments
EDU 17: to compare fa and pa on real annotation time and quality .
EDU 0:
EDU 1: different from traditional active learning
EDU 2: based on sentence-wise full annotation ( fa ) ,
EDU 3: this paper proposes active learning with dependency-wise partial annotation ( pa ) as a finer-grained unit for dependency parsing .
EDU 4: at each iteration , we select a few most uncertain words from an unlabeled data pool ,
EDU 5: manually annotate their syntactic heads ,
EDU 6: and add the partial trees into labeled data for parser retraining .
EDU 7: compared with sentence-wise fa ,
EDU 8: dependency-wise pa gives us more flexibility in task selection
EDU 9: and avoids wasting time on annotating trivial tasks in a sentence .
EDU 10: our work makes the following contributions .
EDU 11: first , we are the first to apply a probabilistic model to active learning for dependency parsing ,
EDU 12: which can 0 ) provide tree probabilities and dependency marginal probabilities as principled uncertainty metrics ,
EDU 13: and 0 ) directly learn parameters from pa
EDU 14: based on a forest-based training objective .
EDU 15: second , we propose and compare several uncertainty metrics through simulation experiments on both chinese and english .
EDU 16: finally , we conduct human annotation experiments
EDU 17: to compare fa and pa on real annotation time and quality .
EDU 0:
EDU 1: we present a novel dependency parsing method
EDU 2: which enforces two structural properties on dependency trees :
EDU 3: bounded block degree and well-nestedness .
EDU 4: these properties are useful to better represent the set of admissible dependency structures in treebanks
EDU 5: and connect dependency parsing to context-sensitive grammatical formalisms .
EDU 6: we cast this problem as an integer linear program
EDU 7: that we solve with lagrangian relaxation
EDU 8: from which we derive a heuristic and an exact method
EDU 9: based on a branch-and-bound search .
EDU 10: experimentally , we see
EDU 11: that these methods are efficient and competitive
EDU 12: compared to a baseline unconstrained parser ,
EDU 13: while enforcing structural properties in all cases .
EDU 0:
EDU 1: we present a novel dependency parsing method
EDU 2: which enforces two structural properties on dependency trees :
EDU 3: bounded block degree and well-nestedness .
EDU 4: these properties are useful to better represent the set of admissible dependency structures in treebanks
EDU 5: and connect dependency parsing to context-sensitive grammatical formalisms .
EDU 6: we cast this problem as an integer linear program
EDU 7: that we solve with lagrangian relaxation
EDU 8: from which we derive a heuristic and an exact method
EDU 9: based on a branch-and-bound search .
EDU 10: experimentally , we see
EDU 11: that these methods are efficient and competitive
EDU 12: compared to a baseline unconstrained parser ,
EDU 13: while enforcing structural properties in all cases .
EDU 0:
EDU 1: continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability
EDU 2: to model term similarity and other relationships .
EDU 3: we study the use of term relatedness in the context of query expansion for ad hoc information retrieval .
EDU 4: we demonstrate
EDU 5: that word embeddings such as word0vec and glove ,
EDU 6: when trained globally ,
EDU 7: underperform corpus and query specific embeddings for retrieval tasks .
EDU 8: these results suggest
EDU 9: that other tasks
EDU 10: benefiting from global embeddings
EDU 11: may also benefit from local embeddings .
EDU 0:
EDU 1: continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability
EDU 2: to model term similarity and other relationships .
EDU 3: we study the use of term relatedness in the context of query expansion for ad hoc information retrieval .
EDU 4: we demonstrate
EDU 5: that word embeddings such as word0vec and glove ,
EDU 6: when trained globally ,
EDU 7: underperform corpus and query specific embeddings for retrieval tasks .
EDU 8: these results suggest
EDU 9: that other tasks
EDU 10: benefiting from global embeddings
EDU 11: may also benefit from local embeddings .
EDU 0:
EDU 1: community question answering ( cqa ) services
EDU 2: like yahoo answers , baidu zhidao , quora , stackoverflow etc.
EDU 3: provide a platform for interaction with experts
EDU 4: and help users to obtain precise and accurate answers to their questions .
EDU 5: the time lag between the user
EDU 6: posting a question
EDU 7: and receiving its answer
EDU 8: could be reduced
EDU 9: by retrieving similar historic questions from the cqa archives .
EDU 10: the main challenge in this task is the `` lexicosyntactic '' gap between the current and the previous questions .
EDU 11: in this paper , we propose a novel approach
EDU 12: called `` siamese convolutional neural network for cqa ( scqa ) ''
EDU 13: to find the semantic similarity between the current and the archived questions .
EDU 14: scqa consist of twin convolutional neural networks with shared parameters and a contrastive loss function
EDU 15: joining them .
EDU 16: scqa learns the similarity metric for question-question pairs
EDU 17: by leveraging the question-answer pairs available in cqa forum archives .
EDU 18: the model projects semantically similar question pairs nearer to each other and dissimilar question pairs farther away from each other in the semantic space .
EDU 19: experiments on large scale reallife `` yahoo answers '' dataset reveals
EDU 20: that scqa outperforms current state-of-theart approaches
EDU 21: based on translation models , topic models and deep neural network based models
EDU 22: which use non-shared parameters .
EDU 0:
EDU 1: community question answering ( cqa ) services
EDU 2: like yahoo answers , baidu zhidao , quora , stackoverflow etc.
EDU 3: provide a platform for interaction with experts
EDU 4: and help users to obtain precise and accurate answers to their questions .
EDU 5: the time lag between the user
EDU 6: posting a question
EDU 7: and receiving its answer
EDU 8: could be reduced
EDU 9: by retrieving similar historic questions from the cqa archives .
EDU 10: the main challenge in this task is the `` lexicosyntactic '' gap between the current and the previous questions .
EDU 11: in this paper , we propose a novel approach
EDU 12: called `` siamese convolutional neural network for cqa ( scqa ) ''
EDU 13: to find the semantic similarity between the current and the archived questions .
EDU 14: scqa consist of twin convolutional neural networks with shared parameters and a contrastive loss function
EDU 15: joining them .
EDU 16: scqa learns the similarity metric for question-question pairs
EDU 17: by leveraging the question-answer pairs available in cqa forum archives .
EDU 18: the model projects semantically similar question pairs nearer to each other and dissimilar question pairs farther away from each other in the semantic space .
EDU 19: experiments on large scale reallife `` yahoo answers '' dataset reveals
EDU 20: that scqa outperforms current state-of-theart approaches
EDU 21: based on translation models , topic models and deep neural network based models
EDU 22: which use non-shared parameters .
EDU 0:
EDU 1: in this work , we focus on the problem of news citation recommendation .
EDU 2: the task aims to recommend news citations for both authors and readers
EDU 3: to create and search news references .
EDU 4: due to the sparsity issue of news citations and the engineering difficulty
EDU 5: in obtaining information on authors ,
EDU 6: we focus on content similarity-based methods instead of collaborative filtering-based approaches .
EDU 7: in this paper , we explore word embedding
EDU 8: ( i.e. , implicit semantics )
EDU 9: and grounded entities
EDU 10: ( i.e. , explicit semantics )
EDU 11: to address the variety and ambiguity issues of language .
EDU 12: we formulate the problem as a reranking task
EDU 13: and integrate different similarity measures under the learning to rank framework .
EDU 14: we evaluate our approach on a real-world dataset .
EDU 15: the experimental results show the efficacy of our method .
EDU 0:
EDU 1: in this work , we focus on the problem of news citation recommendation .
EDU 2: the task aims to recommend news citations for both authors and readers
EDU 3: to create and search news references .
EDU 4: due to the sparsity issue of news citations and the engineering difficulty
EDU 5: in obtaining information on authors ,
EDU 6: we focus on content similarity-based methods instead of collaborative filtering-based approaches .
EDU 7: in this paper , we explore word embedding
EDU 8: ( i.e. , implicit semantics )
EDU 9: and grounded entities
EDU 10: ( i.e. , explicit semantics )
EDU 11: to address the variety and ambiguity issues of language .
EDU 12: we formulate the problem as a reranking task
EDU 13: and integrate different similarity measures under the learning to rank framework .
EDU 14: we evaluate our approach on a real-world dataset .
EDU 15: the experimental results show the efficacy of our method .
EDU 0:
EDU 1: grapheme-to-phoneme ( g0p ) models are rarely available in low-resource languages ,
EDU 2: as the creation of training and evaluation data is expensive and time-consuming .
EDU 3: we use wiktionary to obtain more than 000k word-pronunciation pairs in more than 000 languages .
EDU 4: we then develop phoneme and language distance metrics
EDU 5: based on phonological and linguistic knowledge ;
EDU 6: applying those ,
EDU 7: we adapt g0p models for highresource languages
EDU 8: to create models for related low-resource languages .
EDU 9: we provide results for models for 000 adapted languages .
EDU 0:
EDU 1: grapheme-to-phoneme ( g0p ) models are rarely available in low-resource languages ,
EDU 2: as the creation of training and evaluation data is expensive and time-consuming .
EDU 3: we use wiktionary to obtain more than 000k word-pronunciation pairs in more than 000 languages .
EDU 4: we then develop phoneme and language distance metrics
EDU 5: based on phonological and linguistic knowledge ;
EDU 6: applying those ,
EDU 7: we adapt g0p models for highresource languages
EDU 8: to create models for related low-resource languages .
EDU 9: we provide results for models for 000 adapted languages .
EDU 0:
EDU 1: most previous approaches to chinese word segmentation formalize this problem as a character-based sequence labeling task
EDU 2: so that only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured .
EDU 3: in this paper , we propose a novel neural framework
EDU 4: which thoroughly eliminates context windows
EDU 5: and can utilize complete segmentation history .
EDU 6: our model employs a gated combination neural network over characters
EDU 7: to produce distributed representations of word candidates ,
EDU 8: which are then given to a long shortterm memory ( lstm ) language scoring model .
EDU 9: experiments on the benchmark datasets show
EDU 10: that without the help of feature engineering as most existing approaches ,
EDU 11: our models achieve competitive or better performances with previous stateof-the-art methods .
EDU 0:
EDU 1: most previous approaches to chinese word segmentation formalize this problem as a character-based sequence labeling task
EDU 2: so that only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured .
EDU 3: in this paper , we propose a novel neural framework
EDU 4: which thoroughly eliminates context windows
EDU 5: and can utilize complete segmentation history .
EDU 6: our model employs a gated combination neural network over characters
EDU 7: to produce distributed representations of word candidates ,
EDU 8: which are then given to a long shortterm memory ( lstm ) language scoring model .
EDU 9: experiments on the benchmark datasets show
EDU 10: that without the help of feature engineering as most existing approaches ,
EDU 11: our models achieve competitive or better performances with previous stateof-the-art methods .
EDU 0:
EDU 1: character-based and word-based methods are two main types of statistical models for chinese word segmentation ,
EDU 2: the former exploiting sequence labeling models over characters
EDU 3: and the latter typically exploiting a transition-based model ,
EDU 4: with the advantages
EDU 5: that word-level features can be easily utilized .
EDU 6: neural models have been exploited for character-based chinese word segmentation ,
EDU 7: giving high accuracies
EDU 8: by making use of external character embeddings ,
EDU 9: yet requiring less feature engineering .
EDU 10: in this paper , we study a neural model for word-based chinese word segmentation ,
EDU 11: by replacing the manuallydesigned discrete features with neural features in a word-based segmentation framework .
EDU 12: experimental results demonstrate
EDU 13: that word features lead to comparable performances to the best systems in the literature ,
EDU 14: and a further combination of discrete and neural features gives top accuracies .
EDU 0:
EDU 1: character-based and word-based methods are two main types of statistical models for chinese word segmentation ,
EDU 2: the former exploiting sequence labeling models over characters
EDU 3: and the latter typically exploiting a transition-based model ,
EDU 4: with the advantages
EDU 5: that word-level features can be easily utilized .
EDU 6: neural models have been exploited for character-based chinese word segmentation ,
EDU 7: giving high accuracies
EDU 8: by making use of external character embeddings ,
EDU 9: yet requiring less feature engineering .
EDU 10: in this paper , we study a neural model for word-based chinese word segmentation ,
EDU 11: by replacing the manuallydesigned discrete features with neural features in a word-based segmentation framework .
EDU 12: experimental results demonstrate
EDU 13: that word features lead to comparable performances to the best systems in the literature ,
EDU 14: and a further combination of discrete and neural features gives top accuracies .
EDU 0:
EDU 1: understanding unstructured text is a major goal within natural language processing .
EDU 2: comprehension tests pose questions
EDU 3: based on short text passages
EDU 4: to evaluate such understanding .
EDU 5: in this work , we investigate machine comprehension on the challenging mctest benchmark .
EDU 6: partly because of its limited size ,
EDU 7: prior work on mctest has focused mainly on engineering better features .
EDU 8: we tackle the dataset with a neural approach ,
EDU 9: harnessing simple neural networks
EDU 10: arranged in a parallel hierarchy .
EDU 11: the parallel hierarchy enables our model to compare the passage , question , and answer from a variety of trainable perspectives ,
EDU 12: as opposed to using a manually designed , rigid feature set .
EDU 13: perspectives range from the word level to sentence fragments to sequences of sentences ;
EDU 14: the networks operate only on word-embedding representations of text .
EDU 15: when trained with a methodology
EDU 16: designed to help cope with limited training data ,
EDU 17: our parallel-hierarchical model sets a new state of the art for mctest ,
EDU 18: outperforming previous feature-engineered approaches slightly
EDU 19: and previous neural approaches by a significant margin
EDU 20: ( over 00 percentage points ) .
EDU 0:
EDU 1: understanding unstructured text is a major goal within natural language processing .
EDU 2: comprehension tests pose questions
EDU 3: based on short text passages
EDU 4: to evaluate such understanding .
EDU 5: in this work , we investigate machine comprehension on the challenging mctest benchmark .
EDU 6: partly because of its limited size ,
EDU 7: prior work on mctest has focused mainly on engineering better features .
EDU 8: we tackle the dataset with a neural approach ,
EDU 9: harnessing simple neural networks
EDU 10: arranged in a parallel hierarchy .
EDU 11: the parallel hierarchy enables our model to compare the passage , question , and answer from a variety of trainable perspectives ,
EDU 12: as opposed to using a manually designed , rigid feature set .
EDU 13: perspectives range from the word level to sentence fragments to sequences of sentences ;
EDU 14: the networks operate only on word-embedding representations of text .
EDU 15: when trained with a methodology
EDU 16: designed to help cope with limited training data ,
EDU 17: our parallel-hierarchical model sets a new state of the art for mctest ,
EDU 18: outperforming previous feature-engineered approaches slightly
EDU 19: and previous neural approaches by a significant margin
EDU 20: ( over 00 percentage points ) .
EDU 0:
EDU 1: broad domain question answering is often difficult in the absence of structured knowledge bases ,
EDU 2: and can benefit from shallow lexical methods ( broad coverage ) and logical reasoning ( high precision ) .
EDU 3: we propose an approach
EDU 4: for incorporating both of these signals in a unified framework
EDU 5: based on natural logic .
EDU 6: we extend the breadth of inferences
EDU 7: afforded by natural logic
EDU 8: to include relational entailment
EDU 9: ( e.g. , buy -> own )
EDU 10: and meronymy
EDU 11: ( e.g. , a person
EDU 12: born in a city
EDU 13: is born the city 's country ) .
EDU 14: furthermore , we train an evaluation function
EDU 15: - akin to gameplaying -
EDU 16: to evaluate the expected truth of candidate premises on the fly .
EDU 17: we evaluate our approach on answering multiple choice science questions ,
EDU 18: achieving strong results on the dataset .
EDU 0:
EDU 1: broad domain question answering is often difficult in the absence of structured knowledge bases ,
EDU 2: and can benefit from shallow lexical methods ( broad coverage ) and logical reasoning ( high precision ) .
EDU 3: we propose an approach
EDU 4: for incorporating both of these signals in a unified framework
EDU 5: based on natural logic .
EDU 6: we extend the breadth of inferences
EDU 7: afforded by natural logic
EDU 8: to include relational entailment
EDU 9: ( e.g. , buy -> own )
EDU 10: and meronymy
EDU 11: ( e.g. , a person
EDU 12: born in a city
EDU 13: is born the city 's country ) .
EDU 14: furthermore , we train an evaluation function
EDU 15: - akin to gameplaying -
EDU 16: to evaluate the expected truth of candidate premises on the fly .
EDU 17: we evaluate our approach on answering multiple choice science questions ,
EDU 18: achieving strong results on the dataset .
EDU 0:
EDU 1: cognitive science researchers have emphasized the importance
EDU 2: of ordering a complex task into a sequence of easy to hard problems .
EDU 3: such an ordering provides an easier path to learning
EDU 4: and increases the speed of acquisition of the task
EDU 5: compared to conventional learning .
EDU 6: recent works in machine learning have explored a curriculum learning approach
EDU 7: called selfpaced learning
EDU 8: which orders data samples on the easiness scale
EDU 9: so that easy samples can be introduced to the learning algorithm first
EDU 10: and harder samples can be introduced successively .
EDU 11: we introduce a number of heuristics
EDU 12: that improve upon selfpaced learning .
EDU 13: then , we argue
EDU 14: that incorporating easy , yet , a diverse set of samples can further improve learning .
EDU 15: we compare these curriculum learning proposals in the context of four non-convex models for qa
EDU 16: and show
EDU 17: that they lead to real improvements in each of them .
EDU 0:
EDU 1: cognitive science researchers have emphasized the importance
EDU 2: of ordering a complex task into a sequence of easy to hard problems .
EDU 3: such an ordering provides an easier path to learning
EDU 4: and increases the speed of acquisition of the task
EDU 5: compared to conventional learning .
EDU 6: recent works in machine learning have explored a curriculum learning approach
EDU 7: called selfpaced learning
EDU 8: which orders data samples on the easiness scale
EDU 9: so that easy samples can be introduced to the learning algorithm first
EDU 10: and harder samples can be introduced successively .
EDU 11: we introduce a number of heuristics
EDU 12: that improve upon selfpaced learning .
EDU 13: then , we argue
EDU 14: that incorporating easy , yet , a diverse set of samples can further improve learning .
EDU 15: we compare these curriculum learning proposals in the context of four non-convex models for qa
EDU 16: and show
EDU 17: that they lead to real improvements in each of them .
EDU 0:
EDU 1: passage-level question answer matching is a challenging task
EDU 2: since it requires effective representations
EDU 3: that capture the complex semantic relations between questions and answers .
EDU 4: in this work , we propose a series of deep learning models
EDU 5: to address passage answer selection .
EDU 6: to match passage answers to questions
EDU 7: accommodating their complex semantic relations ,
EDU 8: unlike most previous work
EDU 9: that utilizes a single deep learning structure ,
EDU 10: we develop hybrid models
EDU 11: that process the text
EDU 12: using both convolutional and recurrent neural networks ,
EDU 13: combining the merits on extracting linguistic information from both structures .
EDU 14: additionally , we also develop a simple but effective attention mechanism
EDU 15: for the purpose of constructing better answer representations according to the input question ,
EDU 16: which is imperative for better modeling long answer sequences .
EDU 17: the results on two public benchmark datasets , insuranceqa and trec-qa , show
EDU 18: that our proposed models outperform a variety of strong baselines .
EDU 0:
EDU 1: passage-level question answer matching is a challenging task
EDU 2: since it requires effective representations
EDU 3: that capture the complex semantic relations between questions and answers .
EDU 4: in this work , we propose a series of deep learning models
EDU 5: to address passage answer selection .
EDU 6: to match passage answers to questions
EDU 7: accommodating their complex semantic relations ,
EDU 8: unlike most previous work
EDU 9: that utilizes a single deep learning structure ,
EDU 10: we develop hybrid models
EDU 11: that process the text
EDU 12: using both convolutional and recurrent neural networks ,
EDU 13: combining the merits on extracting linguistic information from both structures .
EDU 14: additionally , we also develop a simple but effective attention mechanism
EDU 15: for the purpose of constructing better answer representations according to the input question ,
EDU 16: which is imperative for better modeling long answer sequences .
EDU 17: the results on two public benchmark datasets , insuranceqa and trec-qa , show
EDU 18: that our proposed models outperform a variety of strong baselines .
EDU 0:
EDU 1: question answering requires access to a knowledge base
EDU 2: to check facts and reason about information .
EDU 3: knowledge in the form of natural language text is easy to acquire ,
EDU 4: but difficult for automated reasoning .
EDU 5: highly-structured knowledge bases can facilitate reasoning ,
EDU 6: but are difficult to acquire .
EDU 7: in this paper we explore tables as a semi-structured formalism
EDU 8: that provides a balanced compromise to this tradeoff .
EDU 9: we first use the structure of tables
EDU 10: to guide the construction of a dataset of over 0000 multiple-choice questions with rich alignment annotations ,
EDU 11: easily and efficiently via crowd-sourcing .
EDU 12: we then use this annotated data
EDU 13: to train a semistructured feature-driven model for question answering
EDU 14: that uses tables as a knowledge base .
EDU 15: in benchmark evaluations , we significantly outperform both a strong unstructured retrieval baseline and a highlystructured markov logic network model .
EDU 0:
EDU 1: question answering requires access to a knowledge base
EDU 2: to check facts and reason about information .
EDU 3: knowledge in the form of natural language text is easy to acquire ,
EDU 4: but difficult for automated reasoning .
EDU 5: highly-structured knowledge bases can facilitate reasoning ,
EDU 6: but are difficult to acquire .
EDU 7: in this paper we explore tables as a semi-structured formalism
EDU 8: that provides a balanced compromise to this tradeoff .
EDU 9: we first use the structure of tables
EDU 10: to guide the construction of a dataset of over 0000 multiple-choice questions with rich alignment annotations ,
EDU 11: easily and efficiently via crowd-sourcing .
EDU 12: we then use this annotated data
EDU 13: to train a semistructured feature-driven model for question answering
EDU 14: that uses tables as a knowledge base .
EDU 15: in benchmark evaluations , we significantly outperform both a strong unstructured retrieval baseline and a highlystructured markov logic network model .
EDU 0:
EDU 1: traditional approaches to extractive summarization rely heavily on humanengineered features .
EDU 2: in this work we propose a data-driven approach
EDU 3: based on neural networks and continuous sentence features .
EDU 4: we develop a general framework for single-document summarization
EDU 5: composed of a hierarchical document encoder and an attention-based extractor .
EDU 6: this architecture allows us to develop different classes of summarization models
EDU 7: which can extract sentences or words .
EDU 8: we train our models on large scale corpora
EDU 9: containing hundreds of thousands of document-summary pairs0 .
EDU 10: experimental results on two summarization datasets demonstrate
EDU 11: that our models obtain results comparable to the state of the art
EDU 12: without any access to linguistic annotation .
EDU 0:
EDU 1: traditional approaches to extractive summarization rely heavily on humanengineered features .
EDU 2: in this work we propose a data-driven approach
EDU 3: based on neural networks and continuous sentence features .
EDU 4: we develop a general framework for single-document summarization
EDU 5: composed of a hierarchical document encoder and an attention-based extractor .
EDU 6: this architecture allows us to develop different classes of summarization models
EDU 7: which can extract sentences or words .
EDU 8: we train our models on large scale corpora
EDU 9: containing hundreds of thousands of document-summary pairs0 .
EDU 10: experimental results on two summarization datasets demonstrate
EDU 11: that our models obtain results comparable to the state of the art
EDU 12: without any access to linguistic annotation .
EDU 0:
EDU 1: automatic negation scope detection is a task
EDU 2: that has been tackled
EDU 3: using different classifiers and heuristics .
EDU 4: most systems are however 0 ) highly-engineered ,
EDU 5: 0 ) english-specific ,
EDU 6: and 0 ) only tested on the same genre
EDU 7: they were trained on .
EDU 8: we start by addressing 0 ) and 0 )
EDU 9: using a neural network architecture .
EDU 10: results
EDU 11: obtained on data from the * sem0000 shared task on negation scope detection
EDU 12: show
EDU 13: that even a simple feed-forward neural network
EDU 14: using word-embedding features alone ,
EDU 15: performs on par with earlier classifiers , with a bi-directional lstm
EDU 16: outperforming all of them .
EDU 17: we then address 0 ) by means of a specially-designed synthetic test set ;
EDU 18: in doing so ,
EDU 19: we explore the problem of detecting the negation scope more in depth
EDU 20: and show
EDU 21: that performance suffers from genre effects
EDU 22: and differs with the type of negation
EDU 23: considered .
EDU 0:
EDU 1: automatic negation scope detection is a task
EDU 2: that has been tackled
EDU 3: using different classifiers and heuristics .
EDU 4: most systems are however 0 ) highly-engineered ,
EDU 5: 0 ) english-specific ,
EDU 6: and 0 ) only tested on the same genre
EDU 7: they were trained on .
EDU 8: we start by addressing 0 ) and 0 )
EDU 9: using a neural network architecture .
EDU 10: results
EDU 11: obtained on data from the * sem0000 shared task on negation scope detection
EDU 12: show
EDU 13: that even a simple feed-forward neural network
EDU 14: using word-embedding features alone ,
EDU 15: performs on par with earlier classifiers , with a bi-directional lstm
EDU 16: outperforming all of them .
EDU 17: we then address 0 ) by means of a specially-designed synthetic test set ;
EDU 18: in doing so ,
EDU 19: we explore the problem of detecting the negation scope more in depth
EDU 20: and show
EDU 21: that performance suffers from genre effects
EDU 22: and differs with the type of negation
EDU 23: considered .
EDU 0:
EDU 1: most sentence embedding models typically represent each sentence
EDU 2: only using word surface ,
EDU 3: which makes these models indiscriminative for ubiquitous homonymy and polysemy .
EDU 4: in order to enhance representation capability of sentence ,
EDU 5: we employ conceptualization model
EDU 6: to assign associated concepts for each sentence in the text corpus ,
EDU 7: and then learn conceptual sentence embedding ( cse ) .
EDU 8: hence , this semantic representation is more expressive than some widely-used text representation models
EDU 9: such as latent topic model ,
EDU 10: especially for short-text .
EDU 11: moreover , we further extend cse models
EDU 12: by utilizing a local attention-based model
EDU 13: that select relevant words within the context
EDU 14: to make more efficient prediction .
EDU 15: in the experiments , we evaluate the cse models on two tasks ,
EDU 16: text classification and information retrieval .
EDU 17: the experimental results show
EDU 18: that the proposed models outperform typical sentence embed-ding models .
EDU 0:
EDU 1: most sentence embedding models typically represent each sentence
EDU 2: only using word surface ,
EDU 3: which makes these models indiscriminative for ubiquitous homonymy and polysemy .
EDU 4: in order to enhance representation capability of sentence ,
EDU 5: we employ conceptualization model
EDU 6: to assign associated concepts for each sentence in the text corpus ,
EDU 7: and then learn conceptual sentence embedding ( cse ) .
EDU 8: hence , this semantic representation is more expressive than some widely-used text representation models
EDU 9: such as latent topic model ,
EDU 10: especially for short-text .
EDU 11: moreover , we further extend cse models
EDU 12: by utilizing a local attention-based model
EDU 13: that select relevant words within the context
EDU 14: to make more efficient prediction .
EDU 15: in the experiments , we evaluate the cse models on two tasks ,
EDU 16: text classification and information retrieval .
EDU 17: the experimental results show
EDU 18: that the proposed models outperform typical sentence embed-ding models .
EDU 0:
EDU 1: most current chatbot engines are designed to reply to user utterances
EDU 2: based on existing utterance-response ( or q-r ) 0 pairs .
EDU 3: in this paper , we present docchat ,
EDU 4: a novel information retrieval approach for chatbot engines
EDU 5: that can leverage unstructured documents , instead of q-r pairs ,
EDU 6: to respond to utterances .
EDU 7: a learning to rank model with features
EDU 8: designed at different levels of granularity
EDU 9: is proposed
EDU 10: to measure the relevance between utterances and responses directly .
EDU 11: we evaluate our proposed approach in both english and chinese
EDU 12: : ( i ) for english , we evaluate docchat on wikiqa and qasent ,
EDU 13: two answer sentence selection tasks ,
EDU 14: and compare it with state-of-the-art methods .
EDU 15: reasonable improvements and good adaptability are observed .
EDU 16: ( ii ) for chinese , we compare docchat with xiaoice0 ,
EDU 17: a famous chitchat engine in china ,
EDU 18: and side-by-side evaluation shows
EDU 19: that docchat is a perfect complement for chatbot engines
EDU 20: using q-r pairs as main source of responses .
EDU 0:
EDU 1: most current chatbot engines are designed to reply to user utterances
EDU 2: based on existing utterance-response ( or q-r ) 0 pairs .
EDU 3: in this paper , we present docchat ,
EDU 4: a novel information retrieval approach for chatbot engines
EDU 5: that can leverage unstructured documents , instead of q-r pairs ,
EDU 6: to respond to utterances .
EDU 7: a learning to rank model with features
EDU 8: designed at different levels of granularity
EDU 9: is proposed
EDU 10: to measure the relevance between utterances and responses directly .
EDU 11: we evaluate our proposed approach in both english and chinese
EDU 12: : ( i ) for english , we evaluate docchat on wikiqa and qasent ,
EDU 13: two answer sentence selection tasks ,
EDU 14: and compare it with state-of-the-art methods .
EDU 15: reasonable improvements and good adaptability are observed .
EDU 16: ( ii ) for chinese , we compare docchat with xiaoice0 ,
EDU 17: a famous chitchat engine in china ,
EDU 18: and side-by-side evaluation shows
EDU 19: that docchat is a perfect complement for chatbot engines
EDU 20: using q-r pairs as main source of responses .
EDU 0:
EDU 1: in conversation , speakers tend to `` accommodate '' or `` align '' to their partners ,
EDU 2: changing the style and substance of their communications
EDU 3: to be more similar to their partners ' utterances .
EDU 4: we focus here on `` linguistic alignment , '' changes in word choice
EDU 5: based on others ' choices .
EDU 6: although linguistic alignment is observed across many different contexts
EDU 7: and its degree correlates with important social factors
EDU 8: such as power and likability ,
EDU 9: its sources are still uncertain .
EDU 10: we build on a recent probabilistic model of alignment ,
EDU 11: using it to separate out alignment attributable to words versus word categories .
EDU 12: we model alignment in two contexts :
EDU 13: telephone conversations and microblog replies .
EDU 14: our results show evidence of alignment ,
EDU 15: but it is primarily lexical rather than categorical .
EDU 16: furthermore , we find
EDU 17: that discourse acts modulate alignment substantially .
EDU 18: this evidence supports the view
EDU 19: that alignment is shaped by strategic communicative processes
EDU 20: related to the ongoing discourse .
EDU 0:
EDU 1: in conversation , speakers tend to `` accommodate '' or `` align '' to their partners ,
EDU 2: changing the style and substance of their communications
EDU 3: to be more similar to their partners ' utterances .
EDU 4: we focus here on `` linguistic alignment , '' changes in word choice
EDU 5: based on others ' choices .
EDU 6: although linguistic alignment is observed across many different contexts
EDU 7: and its degree correlates with important social factors
EDU 8: such as power and likability ,
EDU 9: its sources are still uncertain .
EDU 10: we build on a recent probabilistic model of alignment ,
EDU 11: using it to separate out alignment attributable to words versus word categories .
EDU 12: we model alignment in two contexts :
EDU 13: telephone conversations and microblog replies .
EDU 14: our results show evidence of alignment ,
EDU 15: but it is primarily lexical rather than categorical .
EDU 16: furthermore , we find
EDU 17: that discourse acts modulate alignment substantially .
EDU 18: this evidence supports the view
EDU 19: that alignment is shaped by strategic communicative processes
EDU 20: related to the ongoing discourse .
EDU 0:
EDU 1: the applicability of entropy rate constancy to dialogue is examined on two spoken dialogue corpora .
EDU 2: the principle is found to hold ;
EDU 3: however , new entropy change patterns within the topic episodes of dialogue are described ,
EDU 4: which are different from written text .
EDU 5: speaker 's dynamic roles as topic initiators and topic responders are associated with decreasing and increasing entropy , respectively ,
EDU 6: which results in local convergence between these speakers in each topic episode .
EDU 7: this implies
EDU 8: that the sentence entropy in dialogue is conditioned on different contexts
EDU 9: determined by the speaker 's roles .
EDU 10: explanations from the perspectives of grounding theory and interactive alignment are discussed ,
EDU 11: resulting in a novel , unified informationtheoretic approach of dialogue .
EDU 0:
EDU 1: the applicability of entropy rate constancy to dialogue is examined on two spoken dialogue corpora .
EDU 2: the principle is found to hold ;
EDU 3: however , new entropy change patterns within the topic episodes of dialogue are described ,
EDU 4: which are different from written text .
EDU 5: speaker 's dynamic roles as topic initiators and topic responders are associated with decreasing and increasing entropy , respectively ,
EDU 6: which results in local convergence between these speakers in each topic episode .
EDU 7: this implies
EDU 8: that the sentence entropy in dialogue is conditioned on different contexts
EDU 9: determined by the speaker 's roles .
EDU 10: explanations from the perspectives of grounding theory and interactive alignment are discussed ,
EDU 11: resulting in a novel , unified informationtheoretic approach of dialogue .
EDU 0:
EDU 1: to establish sophisticated dialogue systems ,
EDU 2: text planning needs to cope with congruent as well as incongruent interlocutor interests
EDU 3: as given in everyday dialogues .
EDU 4: little attention has been given to this topic in text planning in contrast to dialogues
EDU 5: that are fully aligned with anticipated user interests .
EDU 6: when considering dialogues with congruent and incongruent interlocutor interests ,
EDU 7: dialogue partners are facing the constant challenge
EDU 8: of finding a balance between cooperation and competition .
EDU 9: we introduce the concept of fairness
EDU 10: that operationalize an equal and adequate ,
EDU 11: i.e. equitable satisfaction of all interlocutors ' interests .
EDU 12: focusing on question-answering ( qa ) settings ,
EDU 13: we describe an answer planning approach
EDU 14: that support fair dialogues under congruent and incongruent interlocutor interests .
EDU 15: due to the fact
EDU 16: that fairness is subjective perse ,
EDU 17: we present promising results from an empirical study ( n=000 )
EDU 18: in which human subjects interacted with a qa system
EDU 19: implementing the proposed approach .
EDU 0:
EDU 1: to establish sophisticated dialogue systems ,
EDU 2: text planning needs to cope with congruent as well as incongruent interlocutor interests
EDU 3: as given in everyday dialogues .
EDU 4: little attention has been given to this topic in text planning in contrast to dialogues
EDU 5: that are fully aligned with anticipated user interests .
EDU 6: when considering dialogues with congruent and incongruent interlocutor interests ,
EDU 7: dialogue partners are facing the constant challenge
EDU 8: of finding a balance between cooperation and competition .
EDU 9: we introduce the concept of fairness
EDU 10: that operationalize an equal and adequate ,
EDU 11: i.e. equitable satisfaction of all interlocutors ' interests .
EDU 12: focusing on question-answering ( qa ) settings ,
EDU 13: we describe an answer planning approach
EDU 14: that support fair dialogues under congruent and incongruent interlocutor interests .
EDU 15: due to the fact
EDU 16: that fairness is subjective perse ,
EDU 17: we present promising results from an empirical study ( n=000 )
EDU 18: in which human subjects interacted with a qa system
EDU 19: implementing the proposed approach .
EDU 0:
EDU 1: modeling interactions between two sentences is crucial for a number of natural language processing tasks
EDU 2: including answer selection , dialogue act analysis , etc. .
EDU 3: while deep learning methods like recurrent neural network or convolutional neural network have been proved to be powerful for sentence modeling ,
EDU 4: prior studies paid less attention on interactions between sentences .
EDU 5: in this work , we propose a sentence interaction network ( sin )
EDU 6: for modeling the complex interactions between two sentences .
EDU 7: by introducing `` interaction states '' for word and phrase pairs ,
EDU 8: sin is powerful and flexible in capturing sentence interactions for different tasks .
EDU 9: we obtain significant improvements on answer selection and dialogue act analysis
EDU 10: without any feature engineering .
EDU 0:
EDU 1: modeling interactions between two sentences is crucial for a number of natural language processing tasks
EDU 2: including answer selection , dialogue act analysis , etc. .
EDU 3: while deep learning methods like recurrent neural network or convolutional neural network have been proved to be powerful for sentence modeling ,
EDU 4: prior studies paid less attention on interactions between sentences .
EDU 5: in this work , we propose a sentence interaction network ( sin )
EDU 6: for modeling the complex interactions between two sentences .
EDU 7: by introducing `` interaction states '' for word and phrase pairs ,
EDU 8: sin is powerful and flexible in capturing sentence interactions for different tasks .
EDU 9: we obtain significant improvements on answer selection and dialogue act analysis
EDU 10: without any feature engineering .
EDU 0:
EDU 1: in this study , we introduce a nondeterministic method
EDU 2: for referring expression generation .
EDU 3: we describe two models
EDU 4: that account for individual variation in the choice of referential form in automatically generated text :
EDU 5: a naive bayes model and a recurrent neural network .
EDU 6: both are evaluated
EDU 7: using the vareg corpus .
EDU 8: then we select the best performing model
EDU 9: to generate referential forms in texts from the grec-0.0 corpus
EDU 10: and conduct an evaluation experiment
EDU 11: in which humans judge the coherence and comprehensibility of the generated texts ,
EDU 12: comparing them both with the original references and those
EDU 13: produced by a random baseline model .
EDU 0:
EDU 1: in this study , we introduce a nondeterministic method
EDU 2: for referring expression generation .
EDU 3: we describe two models
EDU 4: that account for individual variation in the choice of referential form in automatically generated text :
EDU 5: a naive bayes model and a recurrent neural network .
EDU 6: both are evaluated
EDU 7: using the vareg corpus .
EDU 8: then we select the best performing model
EDU 9: to generate referential forms in texts from the grec-0.0 corpus
EDU 10: and conduct an evaluation experiment
EDU 11: in which humans judge the coherence and comprehensibility of the generated texts ,
EDU 12: comparing them both with the original references and those
EDU 13: produced by a random baseline model .
EDU 0:
EDU 1: how much is 000 million us dollars ?
EDU 2: to help readers put such numbers in context ,
EDU 3: we propose a new task of automatically generating short descriptions
EDU 4: known as perspectives ,
EDU 5: e.g. `` $ 000 million is about the cost
EDU 6: to employ everyone in texas over a lunch period '' .
EDU 7: first , we collect a dataset of numeric mentions in news articles ,
EDU 8: where each mention is labeled with a set of rated perspectives .
EDU 9: we then propose a system
EDU 10: to generate these descriptions
EDU 11: consisting of two steps :
EDU 12: formula construction and description generation .
EDU 13: in construction , we compose formulae from numeric facts in a knowledge base
EDU 14: and rank the resulting formulas
EDU 15: based on familiarity , numeric proximity and semantic compatibility .
EDU 16: in generation , we convert a formula into natural language
EDU 17: using a sequence-to-sequence recurrent neural network .
EDU 18: our system obtains a 00.0 % f0 improvement over a non-compositional baseline at formula construction and a 00.0 bleu point improvement over a baseline description generation .
EDU 0:
EDU 1: how much is 000 million us dollars ?
EDU 2: to help readers put such numbers in context ,
EDU 3: we propose a new task of automatically generating short descriptions
EDU 4: known as perspectives ,
EDU 5: e.g. `` $ 000 million is about the cost
EDU 6: to employ everyone in texas over a lunch period '' .
EDU 7: first , we collect a dataset of numeric mentions in news articles ,
EDU 8: where each mention is labeled with a set of rated perspectives .
EDU 9: we then propose a system
EDU 10: to generate these descriptions
EDU 11: consisting of two steps :
EDU 12: formula construction and description generation .
EDU 13: in construction , we compose formulae from numeric facts in a knowledge base
EDU 14: and rank the resulting formulas
EDU 15: based on familiarity , numeric proximity and semantic compatibility .
EDU 16: in generation , we convert a formula into natural language
EDU 17: using a sequence-to-sequence recurrent neural network .
EDU 18: our system obtains a 00.0 % f0 improvement over a non-compositional baseline at formula construction and a 00.0 bleu point improvement over a baseline description generation .
EDU 0:
EDU 1: over the past decade , large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances .
EDU 2: however , to this date , there are no large-scale question-answer corpora available .
EDU 3: in this paper we present the 00m factoid questionanswer corpus ,
EDU 4: an enormous question-answer pair corpus
EDU 5: produced by applying a novel neural network architecture on the knowledge base freebase
EDU 6: to transduce facts into natural language questions .
EDU 7: the produced question-answer pairs are evaluated
EDU 8: both by human evaluators
EDU 9: and using automatic evaluation metrics ,
EDU 10: including well-established machine translation and sentence similarity metrics .
EDU 11: across all evaluation criteria the question-generation model outperforms the competing template-based baseline .
EDU 12: furthermore , when presented to human evaluators ,
EDU 13: the generated questions appear to be comparable in quality to real human-generated questions .
EDU 0:
EDU 1: over the past decade , large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances .
EDU 2: however , to this date , there are no large-scale question-answer corpora available .
EDU 3: in this paper we present the 00m factoid questionanswer corpus ,
EDU 4: an enormous question-answer pair corpus
EDU 5: produced by applying a novel neural network architecture on the knowledge base freebase
EDU 6: to transduce facts into natural language questions .
EDU 7: the produced question-answer pairs are evaluated
EDU 8: both by human evaluators
EDU 9: and using automatic evaluation metrics ,
EDU 10: including well-established machine translation and sentence similarity metrics .
EDU 11: across all evaluation criteria the question-generation model outperforms the competing template-based baseline .
EDU 12: furthermore , when presented to human evaluators ,
EDU 13: the generated questions appear to be comparable in quality to real human-generated questions .
EDU 0:
EDU 1: many language generation tasks require the production of text
EDU 2: conditioned on both structured and unstructured inputs .
EDU 3: we present a novel neural network architecture
EDU 4: which generates an output sequence
EDU 5: conditioned on an arbitrary number of input functions .
EDU 6: crucially , our approach allows both the choice of conditioning context and the granularity of generation ,
EDU 7: for example characters or tokens ,
EDU 8: to be marginalised ,
EDU 9: thus permitting scalable and effective training .
EDU 10: using this framework ,
EDU 11: we address the problem
EDU 12: of generating programming code from a mixed natural language and structured specification .
EDU 13: we create two new data sets for this paradigm
EDU 14: derived from the collectible trading card games magic the gathering and hearthstone .
EDU 15: on these , and a third preexisting corpus , we demonstrate
EDU 16: that marginalising multiple predictors allows our model to outperform strong benchmarks .
EDU 0:
EDU 1: many language generation tasks require the production of text
EDU 2: conditioned on both structured and unstructured inputs .
EDU 3: we present a novel neural network architecture
EDU 4: which generates an output sequence
EDU 5: conditioned on an arbitrary number of input functions .
EDU 6: crucially , our approach allows both the choice of conditioning context and the granularity of generation ,
EDU 7: for example characters or tokens ,
EDU 8: to be marginalised ,
EDU 9: thus permitting scalable and effective training .
EDU 10: using this framework ,
EDU 11: we address the problem
EDU 12: of generating programming code from a mixed natural language and structured specification .
EDU 13: we create two new data sets for this paradigm
EDU 14: derived from the collectible trading card games magic the gathering and hearthstone .
EDU 15: on these , and a third preexisting corpus , we demonstrate
EDU 16: that marginalising multiple predictors allows our model to outperform strong benchmarks .
EDU 0:
EDU 1: research on generating referring expressions has so far mostly focussed on `` oneshot reference '' ,
EDU 2: where the aim is to generate a single , discriminating expression .
EDU 3: in interactive settings , however , it is not uncommon for reference to be established in `` installments '' ,
EDU 4: where referring information is offered piecewise
EDU 5: until success has been confirmed .
EDU 6: we show
EDU 7: that this strategy can also be advantageous in technical systems
EDU 8: that only have uncertain access to object attributes and categories .
EDU 9: we train a recently introduced model of grounded word meaning on a data set of res for objects in images
EDU 10: and learn to predict semantically appropriate expressions .
EDU 11: in a human evaluation , we observe
EDU 12: that users are sensitive to inadequate object names-
EDU 13: which unfortunately are not unlikely to be generated from low-level visual input .
EDU 14: we propose a solution
EDU 15: inspired from human task-oriented interaction
EDU 16: and implement strategies
EDU 17: for avoiding and repairing semantically inaccurate words .
EDU 18: we enhance a word-based reg with contextaware , referential installments
EDU 19: and find
EDU 20: that they substantially improve the referential success of the system .
EDU 0:
EDU 1: research on generating referring expressions has so far mostly focussed on `` oneshot reference '' ,
EDU 2: where the aim is to generate a single , discriminating expression .
EDU 3: in interactive settings , however , it is not uncommon for reference to be established in `` installments '' ,
EDU 4: where referring information is offered piecewise
EDU 5: until success has been confirmed .
EDU 6: we show
EDU 7: that this strategy can also be advantageous in technical systems
EDU 8: that only have uncertain access to object attributes and categories .
EDU 9: we train a recently introduced model of grounded word meaning on a data set of res for objects in images
EDU 10: and learn to predict semantically appropriate expressions .
EDU 11: in a human evaluation , we observe
EDU 12: that users are sensitive to inadequate object names-
EDU 13: which unfortunately are not unlikely to be generated from low-level visual input .
EDU 14: we propose a solution
EDU 15: inspired from human task-oriented interaction
EDU 16: and implement strategies
EDU 17: for avoiding and repairing semantically inaccurate words .
EDU 18: we enhance a word-based reg with contextaware , referential installments
EDU 19: and find
EDU 20: that they substantially improve the referential success of the system .
EDU 0:
EDU 1: entity resolution is the task
EDU 2: of linking each mention of an entity in text to the corresponding record in a knowledge base ( kb ) .
EDU 3: coherence models for entity resolution encourage all referring expressions in a document to resolve to entities
EDU 4: that are related in the kb .
EDU 5: we explore attentionlike mechanisms for coherence ,
EDU 6: where the evidence for each candidate is based on a small set of strong relations , rather than relations to all other entities in the document .
EDU 7: the rationale is that documentwide support may simply not exist for non-salient entities , or entities
EDU 8: not densely connected in the kb .
EDU 9: our proposed system outperforms state-of-the-art systems on the conll 0000 , tac kbp 0000 , 0000 and 0000 tasks .
EDU 0:
EDU 1: entity resolution is the task
EDU 2: of linking each mention of an entity in text to the corresponding record in a knowledge base ( kb ) .
EDU 3: coherence models for entity resolution encourage all referring expressions in a document to resolve to entities
EDU 4: that are related in the kb .
EDU 5: we explore attentionlike mechanisms for coherence ,
EDU 6: where the evidence for each candidate is based on a small set of strong relations , rather than relations to all other entities in the document .
EDU 7: the rationale is that documentwide support may simply not exist for non-salient entities , or entities
EDU 8: not densely connected in the kb .
EDU 9: our proposed system outperforms state-of-the-art systems on the conll 0000 , tac kbp 0000 , 0000 and 0000 tasks .
EDU 0:
EDU 1: interpretability and discriminative power are the two most basic requirements for an evaluation metric .
EDU 2: in this paper , we report the mention identification effect in the b0 , ceaf , and blanc coreference evaluation metrics
EDU 3: that makes it impossible to interpret their results properly .
EDU 4: the only metric
EDU 5: which is insensitive to this flaw
EDU 6: is muc ,
EDU 7: which , however , is known to be the least discriminative metric .
EDU 8: it is a known fact
EDU 9: that none of the current metrics are reliable .
EDU 10: the common practice for ranking coreference resolvers is to use the average of three different metrics .
EDU 11: however , one cannot expect to obtain a reliable score
EDU 12: by averaging three unreliable metrics .
EDU 13: we propose lea , a link-based entity-aware evaluation metric
EDU 14: that is designed to overcome the shortcomings of the current evaluation metrics .
EDU 15: lea is available as branch lea-scorer in the reference implementation of the official conll scorer .
EDU 0:
EDU 1: interpretability and discriminative power are the two most basic requirements for an evaluation metric .
EDU 2: in this paper , we report the mention identification effect in the b0 , ceaf , and blanc coreference evaluation metrics
EDU 3: that makes it impossible to interpret their results properly .
EDU 4: the only metric
EDU 5: which is insensitive to this flaw
EDU 6: is muc ,
EDU 7: which , however , is known to be the least discriminative metric .
EDU 8: it is a known fact
EDU 9: that none of the current metrics are reliable .
EDU 10: the common practice for ranking coreference resolvers is to use the average of three different metrics .
EDU 11: however , one cannot expect to obtain a reliable score
EDU 12: by averaging three unreliable metrics .
EDU 13: we propose lea , a link-based entity-aware evaluation metric
EDU 14: that is designed to overcome the shortcomings of the current evaluation metrics .
EDU 15: lea is available as branch lea-scorer in the reference implementation of the official conll scorer .
EDU 0:
EDU 1: a long-standing challenge in coreference resolution has been the incorporation of entity-level information - features
EDU 2: defined over clusters of mentions instead of mention pairs .
EDU 3: we present a neural network based coreference system
EDU 4: that produces high-dimensional vector representations for pairs of coreference clusters .
EDU 5: using these representations ,
EDU 6: our system learns when combining clusters is desirable .
EDU 7: we train the system with a learning-to-search algorithm
EDU 8: that teaches it
EDU 9: which local decisions ( cluster merges ) will lead to a high-scoring final coreference partition .
EDU 10: the system substantially outperforms the current state-of-the-art on the english and chinese portions of the conll 0000 shared task dataset
EDU 11: despite using few hand-engineered features .
EDU 0:
EDU 1: a long-standing challenge in coreference resolution has been the incorporation of entity-level information - features
EDU 2: defined over clusters of mentions instead of mention pairs .
EDU 3: we present a neural network based coreference system
EDU 4: that produces high-dimensional vector representations for pairs of coreference clusters .
EDU 5: using these representations ,
EDU 6: our system learns when combining clusters is desirable .
EDU 7: we train the system with a learning-to-search algorithm
EDU 8: that teaches it
EDU 9: which local decisions ( cluster merges ) will lead to a high-scoring final coreference partition .
EDU 10: the system substantially outperforms the current state-of-the-art on the english and chinese portions of the conll 0000 shared task dataset
EDU 11: despite using few hand-engineered features .
EDU 0:
EDU 1: properties of corpora ,
EDU 2: such as the diversity of vocabulary and how tightly related texts cluster together ,
EDU 3: impact the best way
EDU 4: to cluster short texts .
EDU 5: we examine several such properties in a variety of corpora
EDU 6: and track their effects on various combinations of similarity metrics and clustering algorithms .
EDU 7: we show
EDU 8: that semantic similarity metrics outperform traditional n-gram and dependency similarity metrics for kmeans clustering of a linguistically creative dataset ,
EDU 9: but do not help with less creative texts .
EDU 10: yet the choice of similarity metric interacts with the choice of clustering method .
EDU 11: we find
EDU 12: that graphbased clustering methods perform well on tightly clustered data
EDU 13: but poorly on loosely clustered data .
EDU 14: semantic similarity metrics generate loosely clustered output
EDU 15: even when applied to a tightly clustered dataset .
EDU 16: thus , the best performing clustering systems could not use semantic metrics .
EDU 0:
EDU 1: properties of corpora ,
EDU 2: such as the diversity of vocabulary and how tightly related texts cluster together ,
EDU 3: impact the best way
EDU 4: to cluster short texts .
EDU 5: we examine several such properties in a variety of corpora
EDU 6: and track their effects on various combinations of similarity metrics and clustering algorithms .
EDU 7: we show
EDU 8: that semantic similarity metrics outperform traditional n-gram and dependency similarity metrics for kmeans clustering of a linguistically creative dataset ,
EDU 9: but do not help with less creative texts .
EDU 10: yet the choice of similarity metric interacts with the choice of clustering method .
EDU 11: we find
EDU 12: that graphbased clustering methods perform well on tightly clustered data
EDU 13: but poorly on loosely clustered data .
EDU 14: semantic similarity metrics generate loosely clustered output
EDU 15: even when applied to a tightly clustered dataset .
EDU 16: thus , the best performing clustering systems could not use semantic metrics .
EDU 0:
EDU 1: word embedding maps words into a lowdimensional continuous embedding space
EDU 2: by exploiting the local word collocation patterns in a small context window .
EDU 3: on the other hand , topic modeling maps documents onto a low-dimensional topic space ,
EDU 4: by utilizing the global word collocation patterns in the same document .
EDU 5: these two types of patterns are complementary .
EDU 6: in this paper , we propose a generative topic embedding model
EDU 7: to combine the two types of patterns .
EDU 8: in our model , topics are represented by embedding vectors ,
EDU 9: and are shared across documents .
EDU 10: the probability of each word is influenced by both its local context and its topic .
EDU 11: a variational inference method yields the topic embeddings as well as the topic mixing proportions for each document .
EDU 12: jointly they represent the document in a low-dimensional continuous space .
EDU 13: in two document classification tasks , our method performs better than eight existing methods , with fewer features .
EDU 14: in addition , we illustrate with an example
EDU 15: that our method can generate coherent topics
EDU 16: even based on only one document .
EDU 0:
EDU 1: word embedding maps words into a lowdimensional continuous embedding space
EDU 2: by exploiting the local word collocation patterns in a small context window .
EDU 3: on the other hand , topic modeling maps documents onto a low-dimensional topic space ,
EDU 4: by utilizing the global word collocation patterns in the same document .
EDU 5: these two types of patterns are complementary .
EDU 6: in this paper , we propose a generative topic embedding model
EDU 7: to combine the two types of patterns .
EDU 8: in our model , topics are represented by embedding vectors ,
EDU 9: and are shared across documents .
EDU 10: the probability of each word is influenced by both its local context and its topic .
EDU 11: a variational inference method yields the topic embeddings as well as the topic mixing proportions for each document .
EDU 12: jointly they represent the document in a low-dimensional continuous space .
EDU 13: in two document classification tasks , our method performs better than eight existing methods , with fewer features .
EDU 14: in addition , we illustrate with an example
EDU 15: that our method can generate coherent topics
EDU 16: even based on only one document .
EDU 0:
EDU 1: news reader comments
EDU 2: found in many on-line news websites
EDU 3: are typically massive in amount .
EDU 4: we investigate the task of cultural-common topic detection ( ctd ) ,
EDU 5: which is aimed at discovering common discussion topics from news reader comments
EDU 6: written in different languages .
EDU 7: we propose a new probabilistic graphical model
EDU 8: called mcta
EDU 9: which can cope with the language gap
EDU 10: and capture the common semantics in different languages .
EDU 11: we also develop a partially collapsed gibbs sampler
EDU 12: which effectively incorporates the term translation relationship into the detection of cultural-common topics for model parameter learning .
EDU 13: experimental results show improvements over the state-of-the-art model .
EDU 0:
EDU 1: news reader comments
EDU 2: found in many on-line news websites
EDU 3: are typically massive in amount .
EDU 4: we investigate the task of cultural-common topic detection ( ctd ) ,
EDU 5: which is aimed at discovering common discussion topics from news reader comments
EDU 6: written in different languages .
EDU 7: we propose a new probabilistic graphical model
EDU 8: called mcta
EDU 9: which can cope with the language gap
EDU 10: and capture the common semantics in different languages .
EDU 11: we also develop a partially collapsed gibbs sampler
EDU 12: which effectively incorporates the term translation relationship into the detection of cultural-common topics for model parameter learning .
EDU 13: experimental results show improvements over the state-of-the-art model .
EDU 0:
EDU 1: document collections often have links between documents - citations , hyperlinks , or revisions -
EDU 2: and which links are added is often based on topical similarity .
EDU 3: to model these intuitions ,
EDU 4: we introduce a new topic model for documents
EDU 5: situated within a network structure ,
EDU 6: integrating latent blocks of documents with a max-margin learning criterion for link prediction
EDU 7: using topicand word-level features .
EDU 8: experiments on a scientific paper dataset and collection of webpages show
EDU 9: that , by more robustly exploiting the rich link structure within a document network ,
EDU 10: our model improves link prediction , topic quality , and block distributions .
EDU 0:
EDU 1: document collections often have links between documents - citations , hyperlinks , or revisions -
EDU 2: and which links are added is often based on topical similarity .
EDU 3: to model these intuitions ,
EDU 4: we introduce a new topic model for documents
EDU 5: situated within a network structure ,
EDU 6: integrating latent blocks of documents with a max-margin learning criterion for link prediction
EDU 7: using topicand word-level features .
EDU 8: experiments on a scientific paper dataset and collection of webpages show
EDU 9: that , by more robustly exploiting the rich link structure within a document network ,
EDU 10: our model improves link prediction , topic quality , and block distributions .
EDU 0:
EDU 1: sentiment analysis ( sa ) is an active research area nowadays
EDU 2: due to the tremendous interest in aggregating and evaluating opinions
EDU 3: being disseminated by users on the web .
EDU 4: sa of english has been thoroughly researched ;
EDU 5: however research on sa of arabic has just flourished .
EDU 6: twitter is considered a powerful tool
EDU 7: for disseminating information and a rich resource for opinionated text
EDU 8: containing views on many different topics .
EDU 9: in this paper we attempt to bridge a gap in arabic sa of twitter
EDU 10: which is the lack of sentiment lexicons
EDU 11: that are tailored for the informal language of twitter .
EDU 12: we generate two lexicons
EDU 13: extracted from a large dataset of tweets
EDU 14: using two approaches
EDU 15: and evaluate their use in a simple lexicon based method .
EDU 16: the evaluation is performed on internal and external datasets .
EDU 17: the performance of these automatically generated lexicons was very promising ,
EDU 18: albeit the simple method
EDU 19: used for classification .
EDU 20: the best f-score obtained was 00.00 % on the internal dataset and 00.0-00.0 % on the external datasets .
EDU 0:
EDU 1: sentiment analysis ( sa ) is an active research area nowadays
EDU 2: due to the tremendous interest in aggregating and evaluating opinions
EDU 3: being disseminated by users on the web .
EDU 4: sa of english has been thoroughly researched ;
EDU 5: however research on sa of arabic has just flourished .
EDU 6: twitter is considered a powerful tool
EDU 7: for disseminating information and a rich resource for opinionated text
EDU 8: containing views on many different topics .
EDU 9: in this paper we attempt to bridge a gap in arabic sa of twitter
EDU 10: which is the lack of sentiment lexicons
EDU 11: that are tailored for the informal language of twitter .
EDU 12: we generate two lexicons
EDU 13: extracted from a large dataset of tweets
EDU 14: using two approaches
EDU 15: and evaluate their use in a simple lexicon based method .
EDU 16: the evaluation is performed on internal and external datasets .
EDU 17: the performance of these automatically generated lexicons was very promising ,
EDU 18: albeit the simple method
EDU 19: used for classification .
EDU 20: the best f-score obtained was 00.00 % on the internal dataset and 00.0-00.0 % on the external datasets .
EDU 0:
EDU 1: this paper proposes an unsupervised approach
EDU 2: for segmenting a multiauthor document into authorial components .
EDU 3: the key novelty is that we utilize the sequential patterns
EDU 4: hidden among document elements
EDU 5: when determining their authorships .
EDU 6: for this purpose , we adopt hidden markov model ( hmm )
EDU 7: and construct a sequential probabilistic model
EDU 8: to capture the dependencies of sequential sentences and their authorships .
EDU 9: an unsupervised learning method is developed
EDU 10: to initialize the hmm parameters .
EDU 11: experimental results on benchmark datasets have demonstrated the significant benefit of our idea
EDU 12: and our approach has outperformed the state-of-the-arts on all tests .
EDU 13: as an example of its applications , the proposed approach is applied
EDU 14: for attributing authorship of a document
EDU 15: and has also shown promising results .
EDU 0:
EDU 1: this paper proposes an unsupervised approach
EDU 2: for segmenting a multiauthor document into authorial components .
EDU 3: the key novelty is that we utilize the sequential patterns
EDU 4: hidden among document elements
EDU 5: when determining their authorships .
EDU 6: for this purpose , we adopt hidden markov model ( hmm )
EDU 7: and construct a sequential probabilistic model
EDU 8: to capture the dependencies of sequential sentences and their authorships .
EDU 9: an unsupervised learning method is developed
EDU 10: to initialize the hmm parameters .
EDU 11: experimental results on benchmark datasets have demonstrated the significant benefit of our idea
EDU 12: and our approach has outperformed the state-of-the-arts on all tests .
EDU 13: as an example of its applications , the proposed approach is applied
EDU 14: for attributing authorship of a document
EDU 15: and has also shown promising results .
EDU 0:
EDU 1: automated text scoring ( ats ) provides a cost-effective and consistent alternative to human marking .
EDU 2: however , in order to achieve good performance ,
EDU 3: the predictive features of the system need to be manually engineered by human experts .
EDU 4: we introduce a model
EDU 5: that forms word representations
EDU 6: by learning the extent
EDU 7: to which specific words contribute to the text 's score .
EDU 8: using long-short term memory networks
EDU 9: to represent the meaning of texts ,
EDU 10: we demonstrate
EDU 11: that a fully automated framework is able to achieve excellent results over similar approaches .
EDU 12: in an attempt to make our results more interpretable ,
EDU 13: and inspired by recent advances in visualizing neural networks ,
EDU 14: we introduce a novel method
EDU 15: for identifying the regions of the text
EDU 16: that the model has found more discriminative .
EDU 0:
EDU 1: automated text scoring ( ats ) provides a cost-effective and consistent alternative to human marking .
EDU 2: however , in order to achieve good performance ,
EDU 3: the predictive features of the system need to be manually engineered by human experts .
EDU 4: we introduce a model
EDU 5: that forms word representations
EDU 6: by learning the extent
EDU 7: to which specific words contribute to the text 's score .
EDU 8: using long-short term memory networks
EDU 9: to represent the meaning of texts ,
EDU 10: we demonstrate
EDU 11: that a fully automated framework is able to achieve excellent results over similar approaches .
EDU 12: in an attempt to make our results more interpretable ,
EDU 13: and inspired by recent advances in visualizing neural networks ,
EDU 14: we introduce a novel method
EDU 15: for identifying the regions of the text
EDU 16: that the model has found more discriminative .
EDU 0:
EDU 1: digital personal assistants are becoming both more common and more useful .
EDU 2: the major nlp challenge for personal assistants is machine understanding :
EDU 3: translating natural language user commands into an executable representation .
EDU 4: this paper focuses on understanding rules
EDU 5: written as if-then statements ,
EDU 6: though the techniques should be portable to other semantic parsing tasks .
EDU 7: we view understanding as structure prediction
EDU 8: and show improved models
EDU 9: using both conventional techniques and neural network models .
EDU 10: we also discuss various ways
EDU 11: to improve generalization
EDU 12: and reduce overfitting :
EDU 13: synthetic training data from paraphrase , grammar combinations , feature selection and ensembles of multiple systems .
EDU 14: an ensemble of these techniques achieves a new state of the art result with 0 % accuracy improvement .
EDU 0:
EDU 1: we introduce the treebank of learner english ( tle ) ,
EDU 2: the first publicly available syntactic treebank for english as a second language ( esl ) .
EDU 3: the tle provides manually annotated pos tags and universal dependency ( ud ) trees for 0,000 sentences from the cambridge first certificate in english ( fce ) corpus .
EDU 4: the ud annotations are tied to a pre-existing error annotation of the fce ,
EDU 5: whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence .
EDU 6: further on , we delineate esl annotation guidelines
EDU 7: that allow for consistent syntactic treatment of ungrammatical english .
EDU 8: finally , we benchmark pos tagging and dependency parsing performance on the tle dataset
EDU 9: and measure the effect of grammatical errors on parsing accuracy .
EDU 10: we envision the treebank to support a wide range of linguistic and computational research on second language acquisition as well as automatic processing of ungrammatical language .
EDU 0:
EDU 1: we introduce the treebank of learner english ( tle ) ,
EDU 2: the first publicly available syntactic treebank for english as a second language ( esl ) .
EDU 3: the tle provides manually annotated pos tags and universal dependency ( ud ) trees for 0,000 sentences from the cambridge first certificate in english ( fce ) corpus .
EDU 4: the ud annotations are tied to a pre-existing error annotation of the fce ,
EDU 5: whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence .
EDU 6: further on , we delineate esl annotation guidelines
EDU 7: that allow for consistent syntactic treatment of ungrammatical english .
EDU 8: finally , we benchmark pos tagging and dependency parsing performance on the tle dataset
EDU 9: and measure the effect of grammatical errors on parsing accuracy .
EDU 10: we envision the treebank to support a wide range of linguistic and computational research on second language acquisition as well as automatic processing of ungrammatical language .
EDU 0:
EDU 1: neuro-imaging studies on reading different parts of speech ( pos ) report somewhat mixed results ,
EDU 2: yet some of them indicate different activations with different pos .
EDU 3: this paper addresses the difficulty
EDU 4: of using fmri to discriminate between linguistic tokens in reading of running text
EDU 5: because of low temporal resolution .
EDU 6: we show
EDU 7: that once we solve this problem ,
EDU 8: fmri data contains a signal of pos distinctions to the extent
EDU 9: that it improves pos induction with error reductions of more than 0 % .
EDU 0:
EDU 1: neuro-imaging studies on reading different parts of speech ( pos ) report somewhat mixed results ,
EDU 2: yet some of them indicate different activations with different pos .
EDU 3: this paper addresses the difficulty
EDU 4: of using fmri to discriminate between linguistic tokens in reading of running text
EDU 5: because of low temporal resolution .
EDU 6: we show
EDU 7: that once we solve this problem ,
EDU 8: fmri data contains a signal of pos distinctions to the extent
EDU 9: that it improves pos induction with error reductions of more than 0 % .
EDU 0:
EDU 1: relation classification is an important semantic processing task in the field of natural language processing ( nlp ) .
EDU 2: in this paper , we present a novel model brcnn
EDU 3: to classify the relation of two entities in a sentence .
EDU 4: some state-of-the-art systems concentrate on modeling the shortest dependency path ( sdp ) between two entities
EDU 5: leveraging convolutional or recurrent neural networks .
EDU 6: we further explore how to make full use of the dependency relations information in the sdp ,
EDU 7: by combining convolutional neural networks and twochannel recurrent neural networks with long short term memory ( lstm ) units .
EDU 8: we propose a bidirectional architecture
EDU 9: to learn relation representations with directional information along the sdp forwards and backwards at the same time ,
EDU 10: which benefits classifying the direction of relations .
EDU 11: experimental results show
EDU 12: that our method outperforms the state-of-theart approaches on the semeval-0000 task 0 dataset .
EDU 0:
EDU 1: relation classification is an important semantic processing task in the field of natural language processing ( nlp ) .
EDU 2: in this paper , we present a novel model brcnn
EDU 3: to classify the relation of two entities in a sentence .
EDU 4: some state-of-the-art systems concentrate on modeling the shortest dependency path ( sdp ) between two entities
EDU 5: leveraging convolutional or recurrent neural networks .
EDU 6: we further explore how to make full use of the dependency relations information in the sdp ,
EDU 7: by combining convolutional neural networks and twochannel recurrent neural networks with long short term memory ( lstm ) units .
EDU 8: we propose a bidirectional architecture
EDU 9: to learn relation representations with directional information along the sdp forwards and backwards at the same time ,
EDU 10: which benefits classifying the direction of relations .
EDU 11: experimental results show
EDU 12: that our method outperforms the state-of-theart approaches on the semeval-0000 task 0 dataset .
EDU 0:
EDU 1: a major challenge of semantic parsing is the vocabulary mismatch problem between natural language and target ontology .
EDU 2: in this paper , we propose a sentence rewriting based semantic parsing method ,
EDU 3: which can effectively resolve the mismatch problem
EDU 4: by rewriting a sentence into a new form
EDU 5: which has the same structure with its target logical form .
EDU 6: specifically , we propose two sentence-rewriting methods for two common types of mismatch :
EDU 7: a dictionary-based method for 0-n mismatch and a template-based method for n-0 mismatch .
EDU 8: we evaluate our sentence rewriting based semantic parser on the benchmark semantic parsing dataset - webquestions .
EDU 9: experimental results show
EDU 10: that our system outperforms the base system with a 0.0 % gain in f0 ,
EDU 11: and generates logical forms more accurately
EDU 12: and parses sentences more robustly .
EDU 0:
EDU 1: a major challenge of semantic parsing is the vocabulary mismatch problem between natural language and target ontology .
EDU 2: in this paper , we propose a sentence rewriting based semantic parsing method ,
EDU 3: which can effectively resolve the mismatch problem
EDU 4: by rewriting a sentence into a new form
EDU 5: which has the same structure with its target logical form .
EDU 6: specifically , we propose two sentence-rewriting methods for two common types of mismatch :
EDU 7: a dictionary-based method for 0-n mismatch and a template-based method for n-0 mismatch .
EDU 8: we evaluate our sentence rewriting based semantic parser on the benchmark semantic parsing dataset - webquestions .
EDU 9: experimental results show
EDU 10: that our system outperforms the base system with a 0.0 % gain in f0 ,
EDU 11: and generates logical forms more accurately
EDU 12: and parses sentences more robustly .
EDU 0:
EDU 1: while unsupervised anaphoric zero pronoun ( azp ) resolvers have recently been shown to rival their supervised counterparts in performance ,
EDU 2: it is relatively difficult to scale them up
EDU 3: to reach the next level of performance
EDU 4: due to the large amount of feature engineering efforts involved and their ineffectiveness in exploiting lexical features .
EDU 5: to address these weaknesses ,
EDU 6: we propose a supervised approach to azp resolution based on deep neural networks ,
EDU 7: taking advantage of their ability
EDU 8: to learn useful task-specific representations
EDU 9: and effectively exploit lexical features via word embeddings .
EDU 10: our approach achieves stateof-the-art performance
EDU 11: when resolving the chinese azps in the ontonotes corpus .
EDU 0:
EDU 1: while unsupervised anaphoric zero pronoun ( azp ) resolvers have recently been shown to rival their supervised counterparts in performance ,
EDU 2: it is relatively difficult to scale them up
EDU 3: to reach the next level of performance
EDU 4: due to the large amount of feature engineering efforts involved and their ineffectiveness in exploiting lexical features .
EDU 5: to address these weaknesses ,
EDU 6: we propose a supervised approach to azp resolution based on deep neural networks ,
EDU 7: taking advantage of their ability
EDU 8: to learn useful task-specific representations
EDU 9: and effectively exploit lexical features via word embeddings .
EDU 10: our approach achieves stateof-the-art performance
EDU 11: when resolving the chinese azps in the ontonotes corpus .
EDU 0:
EDU 1: supervised machine learning models for automated essay scoring ( aes ) usually require substantial task-specific training data
EDU 2: in order to make accurate predictions for a particular writing task .
EDU 3: this limitation hinders their utility , and consequently their deployment in real-world settings .
EDU 4: in this paper , we overcome this shortcoming
EDU 5: using a constrained multi-task pairwisepreference learning approach
EDU 6: that enables the data from multiple tasks to be combined effectively .
EDU 7: furthermore , contrary to some recent research ,
EDU 8: we show
EDU 9: that high performance aes systems can be built with little or no task-specific training data .
EDU 10: we perform a detailed study of our approach on a publicly available dataset in scenarios
EDU 11: where we have varying amounts of task-specific training data
EDU 12: and in scenarios
EDU 13: where the number of tasks increases .
EDU 0:
EDU 1: supervised machine learning models for automated essay scoring ( aes ) usually require substantial task-specific training data
EDU 2: in order to make accurate predictions for a particular writing task .
EDU 3: this limitation hinders their utility , and consequently their deployment in real-world settings .
EDU 4: in this paper , we overcome this shortcoming
EDU 5: using a constrained multi-task pairwisepreference learning approach
EDU 6: that enables the data from multiple tasks to be combined effectively .
EDU 7: furthermore , contrary to some recent research ,
EDU 8: we show
EDU 9: that high performance aes systems can be built with little or no task-specific training data .
EDU 10: we perform a detailed study of our approach on a publicly available dataset in scenarios
EDU 11: where we have varying amounts of task-specific training data
EDU 12: and in scenarios
EDU 13: where the number of tasks increases .
EDU 0:
EDU 1: how can we enable computers to automatically answer questions
EDU 2: like `` who created the character harry potter '' ?
EDU 3: carefully built knowledge bases provide rich sources of facts .
EDU 4: however , it remains a challenge to answer factoid questions
EDU 5: raised in natural language
EDU 6: due to numerous expressions of one question .
EDU 7: in particular , we focus on the most common questions - ones
EDU 8: that can be answered with a single fact in the knowledge base .
EDU 9: we propose cfo , a conditional focused neuralnetwork-based approach
EDU 10: to answering factoid questions with knowledge bases .
EDU 11: our approach first zooms in a question
EDU 12: to find more probable candidate subject mentions ,
EDU 13: and infers the final answers with a unified conditional probabilistic framework .
EDU 14: powered by deep recurrent neural networks and neural embeddings ,
EDU 15: our proposed cfo achieves an accuracy of 00.0 % on a dataset of 000k questions - the largest public one to date .
EDU 16: it outperforms the current state of the art by an absolute margin of 00.0 % .
EDU 0:
EDU 1: how can we enable computers to automatically answer questions
EDU 2: like `` who created the character harry potter '' ?
EDU 3: carefully built knowledge bases provide rich sources of facts .
EDU 4: however , it remains a challenge to answer factoid questions
EDU 5: raised in natural language
EDU 6: due to numerous expressions of one question .
EDU 7: in particular , we focus on the most common questions - ones
EDU 8: that can be answered with a single fact in the knowledge base .
EDU 9: we propose cfo , a conditional focused neuralnetwork-based approach
EDU 10: to answering factoid questions with knowledge bases .
EDU 11: our approach first zooms in a question
EDU 12: to find more probable candidate subject mentions ,
EDU 13: and infers the final answers with a unified conditional probabilistic framework .
EDU 14: powered by deep recurrent neural networks and neural embeddings ,
EDU 15: our proposed cfo achieves an accuracy of 00.0 % on a dataset of 000k questions - the largest public one to date .
EDU 16: it outperforms the current state of the art by an absolute margin of 00.0 % .
EDU 0:
EDU 1: we revisit levin 's theory about the correspondence of verb meaning and syntax
EDU 2: and infer semantic classes from a large syntactic classification of more than 000 german verbs
EDU 3: taking clausal and non-finite arguments .
EDU 4: grasping the meaning components of levin-classes is known to be hard .
EDU 5: we address this challenge
EDU 6: by setting up a multi-perspective semantic characterization of the inferred classes .
EDU 7: to this end , we link the inferred classes and their english translation to independently constructed semantic classes in three different lexicons
EDU 8: - the german wordnet germanet , verbnet and framenet -
EDU 9: and perform a detailed analysis and evaluation of the resulting german-english classification
EDU 10: ( available at www.ukp.tu-darmstadt.de/modality-verbclasses/ ) .
EDU 0:
EDU 1: we revisit levin 's theory about the correspondence of verb meaning and syntax
EDU 2: and infer semantic classes from a large syntactic classification of more than 000 german verbs
EDU 3: taking clausal and non-finite arguments .
EDU 4: grasping the meaning components of levin-classes is known to be hard .
EDU 5: we address this challenge
EDU 6: by setting up a multi-perspective semantic characterization of the inferred classes .
EDU 7: to this end , we link the inferred classes and their english translation to independently constructed semantic classes in three different lexicons
EDU 8: - the german wordnet germanet , verbnet and framenet -
EDU 9: and perform a detailed analysis and evaluation of the resulting german-english classification
EDU 10: ( available at www.ukp.tu-darmstadt.de/modality-verbclasses/ ) .
EDU 0:
EDU 1: most of the existing neural machine translation ( nmt ) models focus on the conversion of sequential data
EDU 2: and do not directly use syntactic information .
EDU 3: we propose a novel end-to-end syntactic nmt model ,
EDU 4: extending a sequenceto-sequence model with the source-side phrase structure .
EDU 5: our model has an attention mechanism
EDU 6: that enables the decoder to generate a translated word
EDU 7: while softly aligning it with phrases as well as words of the source sentence .
EDU 8: experimental results on the wat '00 englishto-japanese dataset demonstrate
EDU 9: that our proposed model considerably outperforms sequence-to-sequence attentional nmt models
EDU 10: and compares favorably with the state-of-the-art tree-to-string smt system .
EDU 0:
EDU 1: most of the existing neural machine translation ( nmt ) models focus on the conversion of sequential data
EDU 2: and do not directly use syntactic information .
EDU 3: we propose a novel end-to-end syntactic nmt model ,
EDU 4: extending a sequenceto-sequence model with the source-side phrase structure .
EDU 5: our model has an attention mechanism
EDU 6: that enables the decoder to generate a translated word
EDU 7: while softly aligning it with phrases as well as words of the source sentence .
EDU 8: experimental results on the wat '00 englishto-japanese dataset demonstrate
EDU 9: that our proposed model considerably outperforms sequence-to-sequence attentional nmt models
EDU 10: and compares favorably with the state-of-the-art tree-to-string smt system .
EDU 0:
EDU 1: coordination is an important and common syntactic construction
EDU 2: which is not handled well by state of the art parsers .
EDU 3: coordinations in the penn treebank are missing internal structure in many cases ,
EDU 4: do not include explicit marking of the conjuncts
EDU 5: and contain various errors and inconsistencies .
EDU 6: in this work , we initiated manual annotation process
EDU 7: for solving these issues .
EDU 8: we identify the different elements in a coordination phrase
EDU 9: and label each element with its function .
EDU 10: we add phrase boundaries
EDU 11: when these are missing ,
EDU 12: unify inconsistencies ,
EDU 13: and fix errors .
EDU 14: the outcome is an extension of the ptb
EDU 15: that includes consistent and detailed structures for coordinations .
EDU 16: we make the coordination annotation publicly available ,
EDU 17: in hope that they will facilitate further research into coordination disambiguation .
EDU 0:
EDU 1: coordination is an important and common syntactic construction
EDU 2: which is not handled well by state of the art parsers .
EDU 3: coordinations in the penn treebank are missing internal structure in many cases ,
EDU 4: do not include explicit marking of the conjuncts
EDU 5: and contain various errors and inconsistencies .
EDU 6: in this work , we initiated manual annotation process
EDU 7: for solving these issues .
EDU 8: we identify the different elements in a coordination phrase
EDU 9: and label each element with its function .
EDU 10: we add phrase boundaries
EDU 11: when these are missing ,
EDU 12: unify inconsistencies ,
EDU 13: and fix errors .
EDU 14: the outcome is an extension of the ptb
EDU 15: that includes consistent and detailed structures for coordinations .
EDU 16: we make the coordination annotation publicly available ,
EDU 17: in hope that they will facilitate further research into coordination disambiguation .
EDU 0:
EDU 1: user traits
EDU 2: disclosed through written text ,
EDU 3: such as age and gender ,
EDU 4: can be used to personalize applications
EDU 5: such as recommender systems or conversational agents .
EDU 6: however , human perception of these traits is not perfectly aligned with reality .
EDU 7: in this paper , we conduct a large-scale crowdsourcing experiment
EDU 8: on guessing age and gender from tweets .
EDU 9: we systematically analyze the quality and possible biases of these predictions .
EDU 10: we identify the textual cues
EDU 11: which lead to miss-assessments of traits
EDU 12: or make annotators more or less confident in their choice .
EDU 13: our study demonstrates
EDU 14: that differences between real and perceived traits are noteworthy
EDU 15: and elucidates inaccurately used stereotypes in human perception .
EDU 0:
EDU 1: user traits
EDU 2: disclosed through written text ,
EDU 3: such as age and gender ,
EDU 4: can be used to personalize applications
EDU 5: such as recommender systems or conversational agents .
EDU 6: however , human perception of these traits is not perfectly aligned with reality .
EDU 7: in this paper , we conduct a large-scale crowdsourcing experiment
EDU 8: on guessing age and gender from tweets .
EDU 9: we systematically analyze the quality and possible biases of these predictions .
EDU 10: we identify the textual cues
EDU 11: which lead to miss-assessments of traits
EDU 12: or make annotators more or less confident in their choice .
EDU 13: our study demonstrates
EDU 14: that differences between real and perceived traits are noteworthy
EDU 15: and elucidates inaccurately used stereotypes in human perception .
EDU 0:
EDU 1: motivated by the findings in social science
EDU 2: that people 's opinions are diverse and variable
EDU 3: while together they are shaped by evolving social norms ,
EDU 4: we perform personalized sentiment classification
EDU 5: via shared model adaptation over time .
EDU 6: in our proposed solution , a global sentiment model is constantly updated
EDU 7: to capture the homogeneity
EDU 8: in which users express opinions ,
EDU 9: while personalized models are simultaneously adapted from the global model
EDU 10: to recognize the heterogeneity of opinions from individuals .
EDU 11: global model sharing alleviates data sparsity issue ,
EDU 12: and individualized model adaptation enables efficient online model learning .
EDU 13: extensive experimentations are performed on two large review collections from amazon and yelp ,
EDU 14: and encouraging performance gain is achieved against several state-of-the-art transfer learning and multi-task learning based sentiment classification solutions .
EDU 0:
EDU 1: motivated by the findings in social science
EDU 2: that people 's opinions are diverse and variable
EDU 3: while together they are shaped by evolving social norms ,
EDU 4: we perform personalized sentiment classification
EDU 5: via shared model adaptation over time .
EDU 6: in our proposed solution , a global sentiment model is constantly updated
EDU 7: to capture the homogeneity
EDU 8: in which users express opinions ,
EDU 9: while personalized models are simultaneously adapted from the global model
EDU 10: to recognize the heterogeneity of opinions from individuals .
EDU 11: global model sharing alleviates data sparsity issue ,
EDU 12: and individualized model adaptation enables efficient online model learning .
EDU 13: extensive experimentations are performed on two large review collections from amazon and yelp ,
EDU 14: and encouraging performance gain is achieved against several state-of-the-art transfer learning and multi-task learning based sentiment classification solutions .
EDU 0:
EDU 1: our goal is to generate reading lists for students
EDU 2: that help them optimally learn technical material .
EDU 3: existing retrieval algorithms return items directly relevant to a query
EDU 4: but do not return results
EDU 5: to help users read about the concepts
EDU 6: supporting their query .
EDU 7: this is because the dependency structure of concepts
EDU 8: that must be understood
EDU 9: before reading material
EDU 10: pertaining to a given query
EDU 11: is never considered .
EDU 12: here we formulate an information-theoretic view of concept dependency
EDU 13: and present methods
EDU 14: to construct a `` concept graph '' automatically from a text corpus .
EDU 15: we perform the first human evaluation of concept dependency edges
EDU 16: ( to be published as open data ) ,
EDU 17: and the results verify the feasibility of automatic approaches
EDU 18: for inferring concepts and their ependency relations .
EDU 19: this result can support search capabilities
EDU 20: that may be tuned to help users learn a subject rather than retrieve documents
EDU 21: based on a single query .
EDU 0:
EDU 1: our goal is to generate reading lists for students
EDU 2: that help them optimally learn technical material .
EDU 3: existing retrieval algorithms return items directly relevant to a query
EDU 4: but do not return results
EDU 5: to help users read about the concepts
EDU 6: supporting their query .
EDU 7: this is because the dependency structure of concepts
EDU 8: that must be understood
EDU 9: before reading material
EDU 10: pertaining to a given query
EDU 11: is never considered .
EDU 12: here we formulate an information-theoretic view of concept dependency
EDU 13: and present methods
EDU 14: to construct a `` concept graph '' automatically from a text corpus .
EDU 15: we perform the first human evaluation of concept dependency edges
EDU 16: ( to be published as open data ) ,
EDU 17: and the results verify the feasibility of automatic approaches
EDU 18: for inferring concepts and their ependency relations .
EDU 19: this result can support search capabilities
EDU 20: that may be tuned to help users learn a subject rather than retrieve documents
EDU 21: based on a single query .
EDU 0:
EDU 1: we prove
EDU 2: that log-linearly interpolated backoff language models can be efficiently and exactly collapsed into a single normalized backoff model ,
EDU 3: contradicting hsu ( 0000 ) .
EDU 4: while prior work reported
EDU 5: that log-linear interpolation yields lower perplexity than linear interpolation ,
EDU 6: normalizing at query time was impractical .
EDU 7: we normalize the model offline in advance ,
EDU 8: which is efficient due to a ecurrence relationship between the normalizing factors .
EDU 9: to tune interpolation weights ,
EDU 10: we apply newton 's method to this convex problem
EDU 11: and show
EDU 12: that the derivatives can be computed efficiently in a batch process .
EDU 13: these findings are combined in new open-source interpolation tool ,
EDU 14: which is distributed with kenlm .
EDU 15: with 00 out-of-domain corpora , log-linear interpolation yields 00.00 perplexity on ted talks ,
EDU 16: compared to 00.00 for linear interpolation .
EDU 0:
EDU 1: we prove
EDU 2: that log-linearly interpolated backoff language models can be efficiently and exactly collapsed into a single normalized backoff model ,
EDU 3: contradicting hsu ( 0000 ) .
EDU 4: while prior work reported
EDU 5: that log-linear interpolation yields lower perplexity than linear interpolation ,
EDU 6: normalizing at query time was impractical .
EDU 7: we normalize the model offline in advance ,
EDU 8: which is efficient due to a ecurrence relationship between the normalizing factors .
EDU 9: to tune interpolation weights ,
EDU 10: we apply newton 's method to this convex problem
EDU 11: and show
EDU 12: that the derivatives can be computed efficiently in a batch process .
EDU 13: these findings are combined in new open-source interpolation tool ,
EDU 14: which is distributed with kenlm .
EDU 15: with 00 out-of-domain corpora , log-linear interpolation yields 00.00 perplexity on ted talks ,
EDU 16: compared to 00.00 for linear interpolation .
EDU 0:
EDU 1: recently a few systems for automatically solving math word problems have reported promising results .
EDU 2: however , the datasets
EDU 3: used for evaluation
EDU 4: have limitations in both scale and diversity .
EDU 5: in this paper , we build a large-scale dataset
EDU 6: which is more than 0 times the size of previous ones ,
EDU 7: and contains many more problem types .
EDU 8: problems in the dataset are semiautomatically obtained from community question-answering ( cqa ) web pages .
EDU 9: a ranking svm model is trained
EDU 10: to automatically extract problem answers from the answer text
EDU 11: provided by cqa users ,
EDU 12: which significantly reduces human annotation cost .
EDU 13: experiments
EDU 14: conducted on the new dataset
EDU 15: lead to interesting and surprising results .
EDU 0:
EDU 1: recently a few systems for automatically solving math word problems have reported promising results .
EDU 2: however , the datasets
EDU 3: used for evaluation
EDU 4: have limitations in both scale and diversity .
EDU 5: in this paper , we build a large-scale dataset
EDU 6: which is more than 0 times the size of previous ones ,
EDU 7: and contains many more problem types .
EDU 8: problems in the dataset are semiautomatically obtained from community question-answering ( cqa ) web pages .
EDU 9: a ranking svm model is trained
EDU 10: to automatically extract problem answers from the answer text
EDU 11: provided by cqa users ,
EDU 12: which significantly reduces human annotation cost .
EDU 13: experiments
EDU 14: conducted on the new dataset
EDU 15: lead to interesting and surprising results .
EDU 0:
EDU 1: recent years have seen a dramatic growth in the popularity of word embeddings
EDU 2: mainly owing to their ability
EDU 3: to capture semantic information from massive amounts of textual content .
EDU 4: as a result , many tasks in natural language processing have tried to take advantage of the potential of these distributional models .
EDU 5: in this work , we study how word embeddings can be used in word sense disambiguation , one of the oldest tasks in natural language processing and artificial intelligence .
EDU 6: we propose different methods
EDU 7: through which word embeddings can be leveraged in a state-of-the-art supervised wsd system architecture ,
EDU 8: and perform a deep analysis
EDU 9: of how different parameters affect performance .
EDU 10: we show how a wsd system
EDU 11: that makes use of word embeddings alone ,
EDU 12: if designed properly ,
EDU 13: can provide significant performance improvement over a state-ofthe-art wsd system
EDU 14: that incorporates several standard wsd features .
EDU 0:
EDU 1: recent years have seen a dramatic growth in the popularity of word embeddings
EDU 2: mainly owing to their ability
EDU 3: to capture semantic information from massive amounts of textual content .
EDU 4: as a result , many tasks in natural language processing have tried to take advantage of the potential of these distributional models .
EDU 5: in this work , we study how word embeddings can be used in word sense disambiguation , one of the oldest tasks in natural language processing and artificial intelligence .
EDU 6: we propose different methods
EDU 7: through which word embeddings can be leveraged in a state-of-the-art supervised wsd system architecture ,
EDU 8: and perform a deep analysis
EDU 9: of how different parameters affect performance .
EDU 10: we show how a wsd system
EDU 11: that makes use of word embeddings alone ,
EDU 12: if designed properly ,
EDU 13: can provide significant performance improvement over a state-ofthe-art wsd system
EDU 14: that incorporates several standard wsd features .
EDU 0:
EDU 1: several large cloze-style context-questionanswer datasets have been introduced recently :
EDU 2: the cnn and daily mail news data and the children 's book test .
EDU 3: thanks to the size of these datasets , the associated text comprehension task is well suited for deep-learning techniques
EDU 4: that currently seem to outperform all alternative approaches .
EDU 5: we present a new , simple model
EDU 6: that uses attention
EDU 7: to directly pick the answer from the context
EDU 8: as opposed to computing the answer
EDU 9: using a blended representation of words in the document
EDU 10: as is usual in similar models .
EDU 11: this makes the model particularly suitable for questionanswering problems
EDU 12: where the answer is a single word from the document .
EDU 13: ensemble of our models sets new state of the art on all evaluated datasets .
EDU 0:
EDU 1: several large cloze-style context-questionanswer datasets have been introduced recently :
EDU 2: the cnn and daily mail news data and the children 's book test .
EDU 3: thanks to the size of these datasets , the associated text comprehension task is well suited for deep-learning techniques
EDU 4: that currently seem to outperform all alternative approaches .
EDU 5: we present a new , simple model
EDU 6: that uses attention
EDU 7: to directly pick the answer from the context
EDU 8: as opposed to computing the answer
EDU 9: using a blended representation of words in the document
EDU 10: as is usual in similar models .
EDU 11: this makes the model particularly suitable for questionanswering problems
EDU 12: where the answer is a single word from the document .
EDU 13: ensemble of our models sets new state of the art on all evaluated datasets .
EDU 0:
EDU 1: we investigate the use of deep bidirectional lstms for joint extraction of opinion entities and the is-from and isabout relations
EDU 2: that connect them
EDU 3: - the first such attempt using a deep learning approach .
EDU 4: perhaps surprisingly , we find
EDU 5: that standard lstms are not competitive with a state-of-the-art crf+ilp joint inference approach ( yang and cardie , 0000 ) to opinion entities extraction ,
EDU 6: performing below even the standalone sequencetagging crf .
EDU 7: incorporating sentence-level and a novel relation-level optimization , however , allows the lstm to identify opinion relations
EDU 8: and to perform within 0-0 % of the state-of-the-art joint model for opinion entities and the is-from relation ;
EDU 9: and to perform as well as the state-of-the-art for the is-about relation
EDU 10: - all without access to opinion lexicons , parsers and other preprocessing components
EDU 11: required for the feature-rich crf+ilp approach .
EDU 0:
EDU 1: we investigate the use of deep bidirectional lstms for joint extraction of opinion entities and the is-from and isabout relations
EDU 2: that connect them
EDU 3: - the first such attempt using a deep learning approach .
EDU 4: perhaps surprisingly , we find
EDU 5: that standard lstms are not competitive with a state-of-the-art crf+ilp joint inference approach ( yang and cardie , 0000 ) to opinion entities extraction ,
EDU 6: performing below even the standalone sequencetagging crf .
EDU 7: incorporating sentence-level and a novel relation-level optimization , however , allows the lstm to identify opinion relations
EDU 8: and to perform within 0-0 % of the state-of-the-art joint model for opinion entities and the is-from relation ;
EDU 9: and to perform as well as the state-of-the-art for the is-about relation
EDU 10: - all without access to opinion lexicons , parsers and other preprocessing components
EDU 11: required for the feature-rich crf+ilp approach .
EDU 0:
EDU 1: this paper proposes a left-corner parser
EDU 2: which can identify nonlocal dependencies .
EDU 3: our parser integrates nonlocal dependency identification into a transition-based system .
EDU 4: we use a structured perceptron
EDU 5: which enables our parser to utilize global features
EDU 6: captured by nonlocal dependencies .
EDU 7: an experimental result demonstrates
EDU 8: that our parser achieves a good balance between constituent parsing and nonlocal dependency identification .
EDU 0:
EDU 1: this paper proposes a left-corner parser
EDU 2: which can identify nonlocal dependencies .
EDU 3: our parser integrates nonlocal dependency identification into a transition-based system .
EDU 4: we use a structured perceptron
EDU 5: which enables our parser to utilize global features
EDU 6: captured by nonlocal dependencies .
EDU 7: an experimental result demonstrates
EDU 8: that our parser achieves a good balance between constituent parsing and nonlocal dependency identification .
EDU 0:
EDU 1: we present the siamese continuous bag of words ( siamese cbow ) model , a neural network for efficient estimation of highquality sentence embeddings .
EDU 2: averaging the embeddings of words in a sentence has proven to be a surprisingly successful and efficient way
EDU 3: of obtaining sentence embeddings .
EDU 4: however , word embeddings
EDU 5: trained with the methods currently available
EDU 6: are not optimized for the task of sentence representation ,
EDU 7: and , thus , likely to be suboptimal .
EDU 8: siamese cbow handles this problem
EDU 9: by training word embeddings directly
EDU 10: for the purpose of being averaged .
EDU 11: the underlying neural network learns word embeddings
EDU 12: by predicting , from a sentence representation , its surrounding sentences .
EDU 13: we show the robustness of the siamese cbow model
EDU 14: by evaluating it on 00 datasets
EDU 15: stemming from a wide variety of sources .
EDU 0:
EDU 1: we present the siamese continuous bag of words ( siamese cbow ) model , a neural network for efficient estimation of highquality sentence embeddings .
EDU 2: averaging the embeddings of words in a sentence has proven to be a surprisingly successful and efficient way
EDU 3: of obtaining sentence embeddings .
EDU 4: however , word embeddings
EDU 5: trained with the methods currently available
EDU 6: are not optimized for the task of sentence representation ,
EDU 7: and , thus , likely to be suboptimal .
EDU 8: siamese cbow handles this problem
EDU 9: by training word embeddings directly
EDU 10: for the purpose of being averaged .
EDU 11: the underlying neural network learns word embeddings
EDU 12: by predicting , from a sentence representation , its surrounding sentences .
EDU 13: we show the robustness of the siamese cbow model
EDU 14: by evaluating it on 00 datasets
EDU 15: stemming from a wide variety of sources .
EDU 0:
EDU 1: can we train a system
EDU 2: that , on any new input , either says `` do n't know ''
EDU 3: or makes a prediction
EDU 4: that is guaranteed to be correct ?
EDU 5: we answer the question in the affirmative
EDU 6: provided our model family is wellspecified .
EDU 7: specifically , we introduce the unanimity principle
EDU 8: : only predict when all models consistent with the training data predict the same output .
EDU 9: we operationalize this principle for semantic parsing , the task of mapping utterances to logical forms .
EDU 10: we develop a simple , efficient method
EDU 11: that reasons over the infinite set of all consistent models
EDU 12: by only checking two of the models .
EDU 13: we prove
EDU 14: that our method obtains 000 % precision even with a modest amount of training data from a possibly adversarial distribution .
EDU 15: empirically , we demonstrate the effectiveness of our approach on the standard geoquery dataset .
EDU 0:
EDU 1: can we train a system
EDU 2: that , on any new input , either says `` do n't know ''
EDU 3: or makes a prediction
EDU 4: that is guaranteed to be correct ?
EDU 5: we answer the question in the affirmative
EDU 6: provided our model family is wellspecified .
EDU 7: specifically , we introduce the unanimity principle
EDU 8: : only predict when all models consistent with the training data predict the same output .
EDU 9: we operationalize this principle for semantic parsing , the task of mapping utterances to logical forms .
EDU 10: we develop a simple , efficient method
EDU 11: that reasons over the infinite set of all consistent models
EDU 12: by only checking two of the models .
EDU 13: we prove
EDU 14: that our method obtains 000 % precision even with a modest amount of training data from a possibly adversarial distribution .
EDU 15: empirically , we demonstrate the effectiveness of our approach on the standard geoquery dataset .
EDU 0:
EDU 1: dialogue topic tracking is a sequential labelling problem
EDU 2: of recognizing the topic state at each time step in given dialogue sequences .
EDU 3: this paper presents various artificial neural network models for dialogue topic tracking ,
EDU 4: including convolutional neural networks
EDU 5: to account for semantics at each individual utterance ,
EDU 6: and recurrent neural networks
EDU 7: to account for conversational contexts along multiple turns in the dialogue history .
EDU 8: the experimental results demonstrate
EDU 9: that our proposed models can significantly improve the tracking performances in human-human conversations .
EDU 0:
EDU 1: dialogue topic tracking is a sequential labelling problem
EDU 2: of recognizing the topic state at each time step in given dialogue sequences .
EDU 3: this paper presents various artificial neural network models for dialogue topic tracking ,
EDU 4: including convolutional neural networks
EDU 5: to account for semantics at each individual utterance ,
EDU 6: and recurrent neural networks
EDU 7: to account for conversational contexts along multiple turns in the dialogue history .
EDU 8: the experimental results demonstrate
EDU 9: that our proposed models can significantly improve the tracking performances in human-human conversations .
EDU 0:
EDU 1: lexico-semantic knowledge of our native language provides an initial foundation for second language learning .
EDU 2: in this paper , we investigate whether and to what extent the lexico-semantic models of the native language ( l0 ) are transferred to the second language ( l0 ) .
EDU 3: specifically , we focus on the problem of lexical choice
EDU 4: and investigate it in the context of three typologically diverse languages :
EDU 5: russian , spanish and english .
EDU 6: we show
EDU 7: that a statistical semantic model
EDU 8: learned from l0 data
EDU 9: improves automatic error detection in l0 for the speakers of the respective l0 .
EDU 10: finally , we investigate whether the semantic model
EDU 11: learned from a particular l0
EDU 12: is portable to other , typologically related languages .
EDU 0:
EDU 1: lexico-semantic knowledge of our native language provides an initial foundation for second language learning .
EDU 2: in this paper , we investigate whether and to what extent the lexico-semantic models of the native language ( l0 ) are transferred to the second language ( l0 ) .
EDU 3: specifically , we focus on the problem of lexical choice
EDU 4: and investigate it in the context of three typologically diverse languages :
EDU 5: russian , spanish and english .
EDU 6: we show
EDU 7: that a statistical semantic model
EDU 8: learned from l0 data
EDU 9: improves automatic error detection in l0 for the speakers of the respective l0 .
EDU 10: finally , we investigate whether the semantic model
EDU 11: learned from a particular l0
EDU 12: is portable to other , typologically related languages .
EDU 0:
EDU 1: fill-in-the-blank items are commonly featured in computer-assisted language learning ( call ) systems .
EDU 2: an item displays a sentence with a blank ,
EDU 3: and often proposes a number of choices
EDU 4: for filling it .
EDU 5: these choices should include one correct answer and several plausible distractors .
EDU 6: we describe a system
EDU 7: that , given an english corpus ,
EDU 8: automatically generates distractors
EDU 9: to produce items for preposition usage .
EDU 10: we report a comprehensive evaluation on this system ,
EDU 11: involving both experts and learners .
EDU 12: first , we analyze the difficulty levels of machine-generated carrier sentences and distractors ,
EDU 13: comparing several methods
EDU 14: that exploit learner error and learner revision patterns .
EDU 15: we show
EDU 16: that the quality of machine-generated items approaches that of human-crafted ones .
EDU 17: further , we investigate the extent
EDU 18: to which mismatched l0 between the user and the learner corpora affects the quality of distractors .
EDU 19: finally , we measure the system 's impact on the user 's language proficiency in both the short and the long term .
EDU 0:
EDU 1: fill-in-the-blank items are commonly featured in computer-assisted language learning ( call ) systems .
EDU 2: an item displays a sentence with a blank ,
EDU 3: and often proposes a number of choices
EDU 4: for filling it .
EDU 5: these choices should include one correct answer and several plausible distractors .
EDU 6: we describe a system
EDU 7: that , given an english corpus ,
EDU 8: automatically generates distractors
EDU 9: to produce items for preposition usage .
EDU 10: we report a comprehensive evaluation on this system ,
EDU 11: involving both experts and learners .
EDU 12: first , we analyze the difficulty levels of machine-generated carrier sentences and distractors ,
EDU 13: comparing several methods
EDU 14: that exploit learner error and learner revision patterns .
EDU 15: we show
EDU 16: that the quality of machine-generated items approaches that of human-crafted ones .
EDU 17: further , we investigate the extent
EDU 18: to which mismatched l0 between the user and the learner corpora affects the quality of distractors .
EDU 19: finally , we measure the system 's impact on the user 's language proficiency in both the short and the long term .
EDU 0:
EDU 1: we present persona-based models
EDU 2: for handling the issue of speaker consistency in neural response generation .
EDU 3: a speaker model encodes personas in distributed embeddings
EDU 4: that capture individual characteristics
EDU 5: such as background information and speaking style .
EDU 6: a dyadic speakeraddressee model captures properties of interactions between two interlocutors .
EDU 7: our models yield qualitative performance improvements in both perplexity and bleu scores over baseline sequence-to-sequence models , with similar gains in speaker consistency
EDU 8: as measured by human judges .
EDU 0:
EDU 1: we present persona-based models
EDU 2: for handling the issue of speaker consistency in neural response generation .
EDU 3: a speaker model encodes personas in distributed embeddings
EDU 4: that capture individual characteristics
EDU 5: such as background information and speaking style .
EDU 6: a dyadic speakeraddressee model captures properties of interactions between two interlocutors .
EDU 7: our models yield qualitative performance improvements in both perplexity and bleu scores over baseline sequence-to-sequence models , with similar gains in speaker consistency
EDU 8: as measured by human judges .
EDU 0:
EDU 1: deep random walk ( deepwalk ) can learn a latent space representation
EDU 2: for describing the topological structure of a network .
EDU 3: however , for relational network classification , deepwalk can be suboptimal
EDU 4: as it lacks a mechanism
EDU 5: to optimize the objective of the target task .
EDU 6: in this paper , we present discriminative deep random walk ( ddrw ) , a novel method for relational network classification .
EDU 7: by solving a joint optimization problem ,
EDU 8: ddrw can learn the latent space representations
EDU 9: that well capture the topological structure
EDU 10: and meanwhile are discriminative for the network classification task .
EDU 11: our experimental results on several real social networks demonstrate
EDU 12: that ddrw significantly outperforms deepwalk on multilabel network classification tasks ,
EDU 13: while retaining the topological structure in the latent space .
EDU 14: ddrw is stable
EDU 15: and consistently outperforms the baseline methods by various percentages of labeled data .
EDU 16: ddrw is also an online method
EDU 17: that is scalable
EDU 18: and can be naturally parallelized .
EDU 0:
EDU 1: deep random walk ( deepwalk ) can learn a latent space representation
EDU 2: for describing the topological structure of a network .
EDU 3: however , for relational network classification , deepwalk can be suboptimal
EDU 4: as it lacks a mechanism
EDU 5: to optimize the objective of the target task .
EDU 6: in this paper , we present discriminative deep random walk ( ddrw ) , a novel method for relational network classification .
EDU 7: by solving a joint optimization problem ,
EDU 8: ddrw can learn the latent space representations
EDU 9: that well capture the topological structure
EDU 10: and meanwhile are discriminative for the network classification task .
EDU 11: our experimental results on several real social networks demonstrate
EDU 12: that ddrw significantly outperforms deepwalk on multilabel network classification tasks ,
EDU 13: while retaining the topological structure in the latent space .
EDU 14: ddrw is stable
EDU 15: and consistently outperforms the baseline methods by various percentages of labeled data .
EDU 16: ddrw is also an online method
EDU 17: that is scalable
EDU 18: and can be naturally parallelized .
EDU 0:
EDU 1: automatically recognising medical concepts
EDU 2: mentioned in social media messages
EDU 3: ( e.g. tweets )
EDU 4: enables several applications
EDU 5: for enhancing health quality of people in a community ,
EDU 6: e.g. real-time monitoring of infectious diseases in population .
EDU 7: however , the discrepancy between the type of language
EDU 8: used in social media and medical ontologies
EDU 9: poses a major challenge .
EDU 10: existing studies deal with this challenge
EDU 11: by employing techniques ,
EDU 12: such as lexical term matching and statistical machine translation .
EDU 13: in this work , we handle the medical concept normalisation at the semantic level .
EDU 14: we investigate the use of neural networks
EDU 15: to learn the transition between layman 's language
EDU 16: used in social media messages
EDU 17: and formal medical language
EDU 18: used in the descriptions of medical concepts in a standard ontology .
EDU 19: we evaluate our approaches
EDU 20: using three different datasets ,
EDU 21: where social media texts are extracted from twitter messages and blog posts .
EDU 22: our experimental results show
EDU 23: that our proposed approaches significantly and consistently outperform existing effective baselines ,
EDU 24: which achieved state-of-the-art performance on several medical concept normalisation tasks ,
EDU 25: by up to 00 % .
EDU 0:
EDU 1: automatically recognising medical concepts
EDU 2: mentioned in social media messages
EDU 3: ( e.g. tweets )
EDU 4: enables several applications
EDU 5: for enhancing health quality of people in a community ,
EDU 6: e.g. real-time monitoring of infectious diseases in population .
EDU 7: however , the discrepancy between the type of language
EDU 8: used in social media and medical ontologies
EDU 9: poses a major challenge .
EDU 10: existing studies deal with this challenge
EDU 11: by employing techniques ,
EDU 12: such as lexical term matching and statistical machine translation .
EDU 13: in this work , we handle the medical concept normalisation at the semantic level .
EDU 14: we investigate the use of neural networks
EDU 15: to learn the transition between layman 's language
EDU 16: used in social media messages
EDU 17: and formal medical language
EDU 18: used in the descriptions of medical concepts in a standard ontology .
EDU 19: we evaluate our approaches
EDU 20: using three different datasets ,
EDU 21: where social media texts are extracted from twitter messages and blog posts .
EDU 22: our experimental results show
EDU 23: that our proposed approaches significantly and consistently outperform existing effective baselines ,
EDU 24: which achieved state-of-the-art performance on several medical concept normalisation tasks ,
EDU 25: by up to 00 % .
EDU 0:
EDU 1: we introduce an agreement-based approach
EDU 2: to learning parallel lexicons and phrases from non-parallel corpora .
EDU 3: the basic idea is to encourage two asymmetric latent-variable translation models
EDU 4: ( i.e. , source-to-target and target-to-source )
EDU 5: to agree on identifying latent phrase and word alignments .
EDU 6: the agreement is defined at both word and phrase levels .
EDU 7: we develop a viterbi em algorithm
EDU 8: for jointly training the two unidirectional models efficiently .
EDU 9: experiments on the chineseenglish dataset show
EDU 10: that agreementbased learning significantly improves both alignment and translation performance .
EDU 0:
EDU 1: we introduce an agreement-based approach
EDU 2: to learning parallel lexicons and phrases from non-parallel corpora .
EDU 3: the basic idea is to encourage two asymmetric latent-variable translation models
EDU 4: ( i.e. , source-to-target and target-to-source )
EDU 5: to agree on identifying latent phrase and word alignments .
EDU 6: the agreement is defined at both word and phrase levels .
EDU 7: we develop a viterbi em algorithm
EDU 8: for jointly training the two unidirectional models efficiently .
EDU 9: experiments on the chineseenglish dataset show
EDU 10: that agreementbased learning significantly improves both alignment and translation performance .
EDU 0:
EDU 1: recently , there is rising interest
EDU 2: in modelling the interactions of text pair with deep neural networks .
EDU 3: in this paper , we propose a model of deep fusion lstms ( df-lstms )
EDU 4: to model the strong interaction of text pair in a recursive matching way .
EDU 5: specifically , df-lstms consist of two interdependent lstms ,
EDU 6: each of which models a sequence under the influence of another .
EDU 7: we also use external memory
EDU 8: to increase the capacity of lstms ,
EDU 9: thereby possibly capturing more complicated matching patterns .
EDU 10: experiments on two very large datasets demonstrate the efficacy of our proposed architecture .
EDU 11: furthermore , we present an elaborate qualitative analysis of our models ,
EDU 12: giving an intuitive understanding
EDU 13: how our model worked .
EDU 0:
EDU 1: recently , there is rising interest
EDU 2: in modelling the interactions of text pair with deep neural networks .
EDU 3: in this paper , we propose a model of deep fusion lstms ( df-lstms )
EDU 4: to model the strong interaction of text pair in a recursive matching way .
EDU 5: specifically , df-lstms consist of two interdependent lstms ,
EDU 6: each of which models a sequence under the influence of another .
EDU 7: we also use external memory
EDU 8: to increase the capacity of lstms ,
EDU 9: thereby possibly capturing more complicated matching patterns .
EDU 10: experiments on two very large datasets demonstrate the efficacy of our proposed architecture .
EDU 11: furthermore , we present an elaborate qualitative analysis of our models ,
EDU 12: giving an intuitive understanding
EDU 13: how our model worked .
EDU 0:
EDU 1: we construct a humans-in-the-loop supervised learning framework
EDU 2: that integrates crowdsourcing feedback and local knowledge
EDU 3: to detect job-related tweets from individual and business accounts .
EDU 4: using data-driven ethnography ,
EDU 5: we examine discourse about work
EDU 6: by fusing languagebased analysis with temporal , geospational , and labor statistics information .
EDU 0:
EDU 1: we construct a humans-in-the-loop supervised learning framework
EDU 2: that integrates crowdsourcing feedback and local knowledge
EDU 3: to detect job-related tweets from individual and business accounts .
EDU 4: using data-driven ethnography ,
EDU 5: we examine discourse about work
EDU 6: by fusing languagebased analysis with temporal , geospational , and labor statistics information .
EDU 0:
EDU 1: nearly all previous work on neural machine translation ( nmt ) has used quite restricted vocabularies , perhaps with a subsequent method
EDU 2: to patch in unknown words .
EDU 3: this paper presents a novel wordcharacter solution
EDU 4: to achieving open vocabulary nmt .
EDU 5: we build hybrid systems
EDU 6: that translate mostly at the word level
EDU 7: and consult the character components for rare words .
EDU 8: our character-level recurrent neural networks compute source word representations
EDU 9: and recover unknown target words
EDU 10: when needed .
EDU 11: the twofold advantage of such a hybrid approach is that it is much faster and easier
EDU 12: to train than character-based ones ;
EDU 13: at the same time , it never produces unknown words as in the case of word-based models .
EDU 14: on the wmt '00 english to czech translation task , this hybrid approach offers an addition boost of 0.0 to 00.0 bleu points over models
EDU 15: that already handle unknown words .
EDU 16: our best system achieves a new state-of-the-art result with 00.0 bleu score .
EDU 17: we demonstrate
EDU 18: that our character models can successfully learn to not only generate well-formed words for czech , a highly-inflected language with a very complex vocabulary ,
EDU 19: but also build correct representations for english source words .
EDU 0:
EDU 1: nearly all previous work on neural machine translation ( nmt ) has used quite restricted vocabularies , perhaps with a subsequent method
EDU 2: to patch in unknown words .
EDU 3: this paper presents a novel wordcharacter solution
EDU 4: to achieving open vocabulary nmt .
EDU 5: we build hybrid systems
EDU 6: that translate mostly at the word level
EDU 7: and consult the character components for rare words .
EDU 8: our character-level recurrent neural networks compute source word representations
EDU 9: and recover unknown target words
EDU 10: when needed .
EDU 11: the twofold advantage of such a hybrid approach is that it is much faster and easier
EDU 12: to train than character-based ones ;
EDU 13: at the same time , it never produces unknown words as in the case of word-based models .
EDU 14: on the wmt '00 english to czech translation task , this hybrid approach offers an addition boost of 0.0 to 00.0 bleu points over models
EDU 15: that already handle unknown words .
EDU 16: our best system achieves a new state-of-the-art result with 00.0 bleu score .
EDU 17: we demonstrate
EDU 18: that our character models can successfully learn to not only generate well-formed words for czech , a highly-inflected language with a very complex vocabulary ,
EDU 19: but also build correct representations for english source words .
EDU 0:
EDU 1: tense , temporal adverbs , and temporal connectives provide information about when events
EDU 2: described in english sentences
EDU 3: occur .
EDU 4: to extract this temporal information from a sentence ,
EDU 5: it must be parsed into a semantic representation
EDU 6: which captures the meaning of tense , temporal adverbs , and temporal connectives .
EDU 7: representations were developed for the basic tenses , some temporal adverbs , as well as some of the temporal connectives .
EDU 8: five criteria were suggested
EDU 9: for judging these representations ,
EDU 10: and based on these criteria the representations were judged .
EDU 0:
EDU 1: this paper describes the sdc pundit , ( prolog understands integrated text ) , system
EDU 2: for processing natural language messages .
EDU 3: 0 pundit ,
EDU 4: written in prolog ,
EDU 5: is a highly modular system
EDU 6: consisting of distinct syntactic , semantic and pragmatics components .
EDU 7: each component draws on one or more sets of data ,
EDU 8: including a lexicon , a broad-coverage grammar of english , semantic verb decompositions , rules
EDU 9: mapping between syntactic and semantic constituents ,
EDU 10: and a domain model .
EDU 11: this paper discusses the communication between the syntactic , semantic and pragmatic modules
EDU 12: that is necessary for making implicit linguistic information explicit .
EDU 13: the key is letting syntax and semantics recognize missing linguistic entities as implicit entities ,
EDU 14: so that they can be labelled as such ,
EDU 15: and referenee resolution can be directed
EDU 16: to find specific referents for the entities .
EDU 17: in this way the task
EDU 18: of making implicit linguistic information explicit
EDU 19: becomes a subset of the tasks
EDU 20: performed by reference resolution .
EDU 21: the success of this approach is dependent on marking missing syntactic constituents as elided
EDU 22: and missing semantic roles as essential
EDU 23: so that reference resolution can know when to look for referents .
EDU 0:
EDU 1: we discuss ways
EDU 2: of allowing the users of a natural language processor to define , examine , and modify the definitions of any domain-specific words or phrases
EDU 3: known to the system .
EDU 4: an implementation of this work forms a critical portion of the knowledge acquisition component of our transportable english-language interface ( tell ) ,
EDU 5: which answers english questions about tabular ( first normal-form ) data files
EDU 6: and runs on a symbolics lisp machine .
EDU 7: however , our techniques enable the design of customization modules
EDU 8: that are largely independent of the syntactic and retrieval components of the specific system
EDU 9: they supply information to .
EDU 10: in addition to its obvious practical value , this area of research is important
EDU 11: because it requires careful attention to the formalisms
EDU 12: used by a natural language system
EDU 13: and to the interactions among the modules
EDU 14: based on those formalisms .
EDU 0:
EDU 1: an important goal of computational linguistics has been to use linguistic theory
EDU 2: to guide the construction of computationally efficient real-world natural language processing systems .
EDU 3: at first glance , generalized phrase structure grammar ( gpsg ) appears to be a blessing on two counts .
EDU 4: first , the precise formalisms of gpsg might be a direct and fransparent guide for parser design and implementation .
EDU 5: second , since gpsg has weak context-free generative power
EDU 6: and context-free languages can be parsed in o(n)
EDU 7: by a wide range of algorithms ,
EDU 8: gpsg parsers would appear to run in polynomial time .
EDU 9: this widely-assumed gpsg " efficient parsability " result is misleading :
EDU 10: here we prove
EDU 11: that the universal recognition problem for current gpsg theory is exponential-polynomial time hard , and assuredly intractable .
EDU 12: the paper pinpoints sources of complexity
EDU 13: ( e.g. metarules and the theory of syntactic features )
EDU 14: in the current gpsg theory
EDU 15: and concludes with some linguistically and computationally motivated restrictions on gpsg .
EDU 0:
EDU 1: taken abstractly ,
EDU 2: the two-level ( kimmo ) morphological framework allows computationally difficult problems to arise .
EDU 3: for example , n + 0 small automata are sufficient to encode the boolean satisfiability problem ( sat ) for formulas in n variables .
EDU 4: however , the suspicion arises
EDU 5: that natural-language problems may have a special structure
EDU 6: -- not shared with sat --
EDU 7: that is not directly captured in the two-level model .
EDU 8: in particular , the natural problems may generally have a modular and local nature
EDU 9: that distinguishes them from more " global " sat problems .
EDU 10: by exploiting this structure ,
EDU 11: it may be possible to solve the natural problems by methods
EDU 12: that do not involve combinatorial search .
EDU 13: we have explored this possibility in a preliminary way
EDU 14: by applying constraint propagation methods to kimmo generation and recognition .
EDU 15: constraint propagation can succeed
EDU 16: when the solution falls into place step-by-step through a chain of limited and local inferences ,
EDU 17: but it is insufficiently powerful to solve unnaturally hard sat problems .
EDU 18: limited tests indicate
EDU 19: that the constraint-propagation algorithm for kimmo generation works for english , turkish , and warlpiri .
EDU 20: when applied to a kimmo system
EDU 21: that encodes sat problems ,
EDU 22: the algorithm succeeds on " easy " sat problems
EDU 23: but fails ( as desired ) on " hard " problems .
EDU 0:
EDU 1: morphological analysis must take into account the spelling-change processes of a language as well as its possible configurations of stems , affixes , and inflectional markings .
EDU 2: the computational difficulty of the task can be clarified
EDU 3: by investigating specific models of morphological processing .
EDU 4: the use of finite-state machinery in the " twolevel " model by kimmo koskenniemi gives it the appearance of computational efficiency ,
EDU 5: but closer examination shows
EDU 6: the model does not guarantee efficient processing .
EDU 7: reductions of the satisfiability problem show
EDU 8: that finding the proper lexical/surface correspondence in a two-level generation or recognition problem can be computationally difficult .
EDU 9: the difficulty increases
EDU 10: if unrestricted deletions ( null characters ) are allowed .
EDU 0:
EDU 1: free-word order languages have long posed significant problems for standard parsing algorithms .
EDU 2: this paper reports on an implemented parser ,
EDU 3: based on governmentbinding theory ( gb ) ( chomsky , 0000 , 0000 ) ,
EDU 4: for a particular free-word order language , warlpiri , an aboriginal language of central australia .
EDU 5: the parser is explicitly designed
EDU 6: to transparently mirror the principles of gb .
EDU 7: the operation of this parsing system is quite different in character from that of a rule-based parsing system , ~ e.g. , a context-free parsing method .
EDU 8: in this system , phrases are constructed via principles of selection , case-marking , caseassignment , and argument-linking , rather than by phrasal rules .
EDU 9: the output of the parser for a sample warlpiri sentence of four words in length is given .
EDU 10: the parser was executed on each of the 00 other permutations of the sentence ,
EDU 11: and it output equivalent parses ,
EDU 12: thereby demonstrating its ability
EDU 13: to correctly handle the highly scrambled sentences
EDU 14: found in warlpiri .
EDU 0:
EDU 1: we examine the relationship between the two grammatical formalisms :
EDU 2: tree adjoining grammars and head grammars .
EDU 3: we briefly investigate the weak equivalence of the two formalisms .
EDU 4: we then turn to a discussion
EDU 5: comparing the linguistic expressiveness of the two formalisms .
EDU 0:
EDU 1: we study the formal and linguistic properties of a class of parenthesis-free categorial grammars
EDU 2: derived from those of ades and steedman
EDU 3: by varying the set of reduction rules .
EDU 4: we characterize the reduction rules
EDU 5: capable of generating context-sensitive languages
EDU 6: as those
EDU 7: having a partial combination rule and a combination rule in the reverse direction .
EDU 8: we show
EDU 9: that any categorial language is a permutation of some context-free language ,
EDU 10: thus inheriting properties dependent on symbol counting only .
EDU 11: we compare some of their properties with other contemporary formalisms .
EDU 0:
EDU 1: conjunctions have always been a source of problems for natural language parsers .
EDU 2: this paper shows
EDU 3: how these problems may be circumvented
EDU 4: using a rule based , walt-and-see parsing strategy .
EDU 5: a parser is presented
EDU 6: which analyzes conjunction structures deterministically ,
EDU 7: and the specific rules
EDU 8: it uses
EDU 9: are described and illustrated .
EDU 10: this parser appears to be faster for conjunctions than other parsers in the literature
EDU 11: and some comparative timings are given .
EDU 0:
EDU 1: the documentation of ( unbounded-length ) copying and cross-serial constructions in a few languages in the recent literature is usually taken
EDU 2: to mean
EDU 3: that natural languages are slightly context-sensitive .
EDU 4: however , this ignores those copying constructions
EDU 5: which ,
EDU 6: while productive ,
EDU 7: cannot be easily shown to apply to infinite sublanguages .
EDU 8: to allow such finite copying constructions to be taken into account in formal modeling ,
EDU 9: it is necessary to recognize
EDU 10: that natural languages cannot be realistically represented by formal languages of the usual sort .
EDU 11: rather , they must be modeled as families of formal languages or as formal languages with indefinite vocabularies .
EDU 12: once this is done ,
EDU 13: we see copying as a truly pervasive and fundamental process in human language .
EDU 14: furthermore , the absence of mirror-image constructions in human languages means
EDU 15: that it is not enough to extend context-free grammars in the direction of context-sensitivity .
EDU 16: instead , a class of grammars must be found
EDU 17: which handles ( context-sensitive ) copying but not ( context-free ) mirror images .
EDU 18: this suggests
EDU 19: that human linguistic processes use queues rather than stacks ,
EDU 20: making imperative the development of a hierarchy of queue grammars as a counterweight to the chomsky grammars .
EDU 21: a simple class of context-free queue grammars is introduced and discussed .
EDU 0:
EDU 1: we outline a model of generation with revision ,
EDU 2: focusing on improving textual coherence .
EDU 3: we argue
EDU 4: that high quality text is more easily produced
EDU 5: by iteratively revising and regenerating ,
EDU 6: as people do ,
EDU 7: rather than by using an architecturally more complex single pass generator .
EDU 8: as a general area of study , the revision process presents interesting problems :
EDU 9: recognition of flaws in text requires a descriptive theory
EDU 10: of what constitutes well written prose
EDU 11: and a parser
EDU 12: which can build a representation in those terms .
EDU 13: improving text requires associating flaws with strategies for improvement .
EDU 14: the strategies , in turn , need to know what adjustments to the decisions
EDU 15: made during the initial generation
EDU 16: will produce appropriate modifications to the text .
EDU 17: we compare our treatment of revision with those of mann and moore ( 0000 ) , gabriel ( 0000 ) , and mann ( 0000 ) .
EDU 0:
EDU 1: as a user interacts with a database or expert system ,
EDU 2: she / he may reveal a misconception about the objects
EDU 3: modeled by the system .
EDU 4: this paper discusses the romper system
EDU 5: for responding to such misconceptions in a domain independent and context sensitive fashion .
EDU 6: romper reasons about possible sources of the misconception .
EDU 7: it operates on a model of the user
EDU 8: and generates a cooperative response
EDU 9: based on this reasoning .
EDU 10: the process is made context sensitive
EDU 11: by augmenting the user model with a new notion of object perspective
EDU 12: which highlights certain aspects of the user model due to previous discourse .
EDU 0:
EDU 1: here we address the problem
EDU 2: of mapping phrase meanings into their conceptual representations .
EDU 3: figurative phrases are pervasive in human communication ,
EDU 4: yet they are difficult to explain theoretically .
EDU 5: in fact , the ability
EDU 6: to handle idiosyncratic behavior of phrases
EDU 7: should be a criterion for any theory of lexical representation .
EDU 8: due to the huge number of such phrases in the english language ,
EDU 9: phrase representation must be amenable to parsing , generation , and also to learning .
EDU 10: in this paper we demonstrate a semantic representation
EDU 11: which facilitates , for a wide variety of phrases , both learning and parsing .
EDU 0:
EDU 1: natural language processing systems need large lexicons
EDU 2: containing explicit information about lexical-semantlc relationships , selection restrictions , and verb categories .
EDU 3: because the labor
EDU 4: involved in constructing such lexicons by hand
EDU 5: is overwhelming ,
EDU 6: we have been trying to construct lexical entries automatically from information available in the machine-readable version of webster 's seventh collegiate dictionary .
EDU 7: this work is rich in implicit information ;
EDU 8: the problem is to make it explicit .
EDU 9: this paper describes methods
EDU 10: for finding taxonomy and set-membership relationships ,
EDU 11: recognizing nouns
EDU 12: that ordinarily represent human beings ,
EDU 13: and identifying active and stative verbs and adjectives .
EDU 0:
EDU 1: dictionary lookup is a computational activity
EDU 2: that can be greatly accelerated
EDU 3: when performed on large amounts of text by a parallel computer such as the connection machine tm computer ( cm ) .
EDU 4: several algorithms for parallel dictionary lookup are discussed ,
EDU 5: including one
EDU 6: that allows the cm to lookup words at a rate 000 times that of lookup on a symbolics 0000 lisp machine .
EDU 0:
EDU 1: we propose a mapping between prosodic phenomena and semantico-pragmatic effects
EDU 2: based upon the hypothesis
EDU 3: that intonation conveys information about the intentional as well as the attentional structure of discourse .
EDU 4: in particular , we discuss
EDU 5: how variations in pitch range and choice of accent and tune can help to convey such information as :
EDU 6: discourse segmentation and topic structure , appropriate choice of referent , the distinction between " given " and " new " information , conceptual contrast or parallelism between mentioned items , and subordination relationships between propositions salient in the discourse .
EDU 7: our goals for this research are practical as well as theoretical.
EDU 8: in particular , we are investigating the problem of intonational assignment in synthetic speech .
EDU 0:
EDU 1: while various aspects of syntactic structure have been shown to bear on the determination of phraselevel prosody ,
EDU 2: the text-to-speech field has lacked a robust working system
EDU 3: to test the possible relations between syntax and prosody .
EDU 4: we describe an implemented system
EDU 5: which uses the deterministic parser fidditch
EDU 6: to create the input for a set of prosody rules .
EDU 7: the prosody rules generate a prosody tree
EDU 8: that specifies the location and relative strength of prosodic phrase boundaries .
EDU 9: these specifications are converted to annotations for the bell labs text-to-speech system
EDU 10: that dictate modulations in pitch and duration for the input sentence .
EDU 11: we discuss the results of an experiment
EDU 12: to determine the performance of our system .
EDU 13: we are encouraged by an initial 0 percent error rate
EDU 14: and we see the design of the parser and the modularity of the system
EDU 15: allowing changes
EDU 16: that will upgrade this rate .
EDU 0:
EDU 1: the following proposal is for a japanese sentence analysis method
EDU 2: to be used in a japanese book reading machine .
EDU 3: this method is designed to allow for several candidates in case of ambiguous characters .
EDU 4: each sentence is analyzed
EDU 5: to compose a data structure
EDU 6: by defining the relationship between words and phrases .
EDU 7: this structure
EDU 8: ( named network structure )
EDU 9: involves all possible combinations of syntactically collect phrases .
EDU 10: after network structure has been completed ,
EDU 11: heuristic rules are applied
EDU 12: in order to determine the most probable way
EDU 13: to arrange the phrases
EDU 14: and thus organize the best sentence .
EDU 15: all information about each sentence ---- the pronunciation of each word with its accent and the structure of phrases ---- will be used during speech synthesis .
EDU 16: experiment results reveal :
EDU 17: 00.0 % of all characters were given their correct pronunciation .
EDU 18: using several recognized character candidates is more efficient
EDU 19: than only using first ranked characters as the input for sentence analysis .
EDU 20: also this facility increases the efficiency of the book reading machine
EDU 21: in that it enables the user to select other ways
EDU 22: to organize sentences .
EDU 0:
EDU 1: a computer program
EDU 2: for synthesizing japanese fundamental frequency contours
EDU 3: implements our theory of japanese intonation .
EDU 4: this theory provides a complete qualitative description of the known characteristics of japanese intonation , as well as a quantitative model of tone-scaling and timing
EDU 5: precise enough to translate straightforwardly into a computational algorithm .
EDU 6: an important aspect of the description is that various features of the intonation pattern are designated to be phonological properties of different types of phrasal units in a hierarchical organization .
EDU 7: this phrasal organization is known to play an important role
EDU 8: in parsing speech .
EDU 9: our research shows it also to be one reflex of intonational prominence ,
EDU 10: and hence of focus and other discourse structures .
EDU 11: the qualitative features of each phrasal level and their implementation in the synthesis program are described .
EDU 0:
EDU 1: in this paper , i describe
EDU 2: how donnellan 's distinction between referential and attributive uses of definite descriptions should be represented in a computational model of reference .
EDU 3: after briefly discussing the significance of donnellan 's distinction ,
EDU 4: i reinterpret it as being three-tiered , relating to object representation , referring intentions , and choice of referring expression .
EDU 5: i then present a cognitive model of referring ,
EDU 6: the components of which correspond to this analysis ,
EDU 7: and discuss the interaction
EDU 8: that takes place among those components .
EDU 9: finally , the implementation of this model , now in progress , is described .
EDU 0:
EDU 1: ambiguities
EDU 2: related to intension and their consequent inference failures
EDU 3: are a diverse group , both syntactically and semantically .
EDU 4: one particular kind of ambiguity
EDU 5: that has received little attention so far
EDU 6: is whether it is the speaker or the third party
EDU 7: to whom a description in an opaque third-party attitude report should be attributed .
EDU 8: the different readings lead to different inferences in a system
EDU 9: modeling the beliefs of external agents .
EDU 10: we propose
EDU 11: that a unified approach to the representation of the alternative readings of intension-related ambiguities can be based on the notion of a descriptor
EDU 12: that is evaluated with respect to intensionality ,
EDU 13: the beliefs of agents , and a time of application .
EDU 14: we describe such a representation ,
EDU 15: built on a standard modal logic ,
EDU 16: and show
EDU 17: how it may be used in conjunction with a knowledge base of background assumptions
EDU 18: to license restricted substitution of equals in opaque contexts .
EDU 0:
EDU 1: a constraint is proposed in the centering approach to pronoun resolution in discourse .
EDU 2: this " property-sharing " constraint requires that two pronominal expressions
EDU 3: that retain the same cb across adjacent utterances
EDU 4: share a certain common grammatical property .
EDU 5: this property is expressed along the dimension of the grammatical function subject for both japanese and english discourses ,
EDU 6: where different pronominal forms are primarily used
EDU 7: to realize the cb .
EDU 8: it is the zero pronominal in japanese , and the ( unstressed ) overt pronoun in english .
EDU 9: the resulting constraint complements the original centering ,
EDU 10: accounting for its apparent violations
EDU 11: and providing a solution to the interpretation of multi-pronominal utterances .
EDU 12: it also provides an alternative account of anaphora interpretation
EDU 13: that appears to be due to structural parallelism .
EDU 14: this reconciliation of centering/focusing and parallelism is a major advantage .
EDU 15: i will then add another dimension
EDU 16: called the " speaker identification "
EDU 17: to the constraint
EDU 18: to handle a group of special eases in japanese discourse .
EDU 19: it indicates a close association between centering and the speaker 's viewpoint ,
EDU 20: and sheds light on what underlies the effect of perception reports on pronoun resolution in general.
EDU 21: these results ,
EDU 22: by drawing on facts in two very different languages ,
EDU 23: demonstrate the cross-linguistic applicability of the centering framework .
EDU 0:
EDU 1: existing models of plan inference ( pi ) in conversation have assumed
EDU 2: that the agent
EDU 3: whose plan is being inferred ( the actor )
EDU 4: and the agent
EDU 5: drawing the inference ( the observer )
EDU 6: have identical beliefs about actions in the domain .
EDU 7: i argue
EDU 8: that this assumption often results in failure of both the pi process and the communicative process
EDU 9: that pi is meant to support .
EDU 10: in particular , it precludes the principled generation of appropriate responses to queries
EDU 11: that arise from invalid plans .
EDU 12: i describe a model of p0
EDU 13: that abandons this assumption .
EDU 14: it rests on an analysis of plans as mental phenomena .
EDU 15: judgements
EDU 16: that a plan is invalid
EDU 17: are associated with particular discrepancies between the beliefs
EDU 18: that the observer ascribes to the actor
EDU 19: when the former believes
EDU 20: that the latter has some plan ,
EDU 21: and the beliefs
EDU 22: that the observer herself holds .
EDU 23: i show
EDU 24: that the content of an appropriate response to a query is affected by the types of any such discrepancies of belief
EDU 25: judged to be present in the plan
EDU 26: inferred to underlie that query .
EDU 27: the pi model
EDU 28: described here
EDU 29: has been implemented in spirit , a small demonstration system
EDU 30: that answers questions about the domain of computer mail .
EDU 0:
EDU 1: to fully understand a sequence of utterances ,
EDU 2: one must be able to infer implicit relationships between the utterances .
EDU 3: although the identification of sets of utterance relationships forms the basis for many theories of discourse ,
EDU 4: the formalization and recognition of such relationships has proven to be an extremely difficult computational task .
EDU 5: this paper presents a plan-based approach to the representation and recognition of implicit relationships between utterances .
EDU 6: relationships are formulated as discourse plans ,
EDU 7: which allows their representation in terms of planning operators and their computation via a plan recognition process .
EDU 8: by incorporating complex inferential processes
EDU 9: relating utterances into a plan-based framework , a formalization and computability not available in the earlier works is provided .
EDU 0:
EDU 1: novice users engaged in task-oriented dialogues with an adviser
EDU 2: to learn how to use an unfamiliar statistical package .
EDU 3: the users ' , task was analyzed
EDU 4: and a task structure was derived .
EDU 5: the task structure was used
EDU 6: to segment the dialogue into subdialogues
EDU 7: associated with the subtasks of the overall task .
EDU 8: the representation of the dialogue structure into a hierarchy of subdialogues ,
EDU 9: partly corresponding to the task structure ,
EDU 10: was validated by three converging analyses .
EDU 11: first , the distribution of non-pronominal noun phrases and the distribution of pronominal noun phrases exhibited a pattern consistent with the derived dialogue structure .
EDU 12: non-pronominal noun phrases occurred more frequently at the beginning of subdialogues than later ,
EDU 13: as can be expected
EDU 14: since one of their functions is to indicate topic shifts .
EDU 15: on the other hand , pronominal noun phrases occurred less frequently in the first sentence of the subdialogues than in the following sentences of the subdialogues ,
EDU 16: as can be expected
EDU 17: since they are used
EDU 18: to indicate topic continuity .
EDU 19: second , the distributions of the antecedents of pronominal noun phrases and of non-pronominal noun phrases showed a pattern consistent with the derived dialogue structure .
EDU 20: finmly , distinctive clue words and phrases were found reliably at the boundaries of subdialogues with different functions .
EDU 0:
EDU 1: a new method is presented
EDU 2: for simplifying the logical expressions
EDU 3: used to represent utterance meaning in a natural language system .
EDU 4: this simplification method utilizes the encoded knowledge and the limited inference-making capability of a taxonomic knowledge representation system
EDU 5: to reduce the constituent structure of logical expressions .
EDU 6: the specific application is to the problem
EDU 7: of mapping expressions of the meaning representation language to a database language
EDU 8: capable of retrieving actual responses .
EDU 9: particular account is taken of the model-theoretic aspects of this problem .
EDU 0:
EDU 1: consideration of the question of meaning in the framework of linguistics often requires an allusion to sets and other higher-order notions .
EDU 2: the traditional approach to representing and reasoning about meaning in a computational setting has been to use knowledge representation systems
EDU 3: that are either based on first-order logic
EDU 4: or that use mechanisms
EDU 5: whose formal justifications are to be provided after the fact .
EDU 6: in this paper we shall consider the use of a higher-order logic for this task .
EDU 7: we first present a version of definite clauses ( positive horn clauses )
EDU 8: that is based on this logic .
EDU 9: predicate and function variables may occur in such clauses
EDU 10: and the terms in the language are the typed λ-terms .
EDU 11: such term structures have a richness
EDU 12: that may be exploited in representing meanings .
EDU 13: we also describe a higher-order logic programming language ,
EDU 14: called λprolog ,
EDU 15: which represents programs as higher-order definite clauses
EDU 16: and interprets them
EDU 17: using a depth-first interpreter .
EDU 18: a virtue of this language is that it is possible to write programs in it
EDU 19: that integrate syntactic and semantic analyses into one computational paradigm .
EDU 20: this is to be contrasted with the more common practice
EDU 21: of using two entirely different computation paradigms ,
EDU 22: such as dcgs or atns for parsing and frames or semantic nets for semantic processing .
EDU 23: we illustrate such an integration in this language
EDU 24: by considering a simple example ,
EDU 25: and we claim
EDU 26: that its use makes the task of providing formal justifications for the computations
EDU 27: specified much more direct .
EDU 0:
EDU 1: unification-based grammar formalisms use structures
EDU 2: containing sets of features
EDU 3: to describe linguistic objects .
EDU 4: although computational algorithms for unification of feature structures have been worked out in experimental research ,
EDU 5: these algcwithms become quite complicated ,
EDU 6: and a more precise description of feature structures is desirable .
EDU 7: we have developed a model
EDU 8: in which descriptions of feature structures can be regarded as logical formulas ,
EDU 9: and interpreted by sets of directed graphs
EDU 10: which satisfy them .
EDU 11: these graphs are , in fact , transition graphs for a special type of deterministic finite automaton .
EDU 12: this semantics for feature structures extends the ideas of pereira and shieber -lsb-00 -rsb- ,
EDU 13: by providing an interpretation for values
EDU 14: which are specified by disjunctions and path values
EDU 15: embedded within disjunctions .
EDU 16: our interpretati0n differs from that of pereira and shieber
EDU 17: by using a logical model in place of a denotational semantics .
EDU 18: this logical model yields a calculus of equivalences ,
EDU 19: which can be used
EDU 20: to simplify formulas .
EDU 21: unification is attractive ,
EDU 22: because of its generality ,
EDU 23: but it is often computationally inefficient .
EDU 24: our mode -rsb- allows a careful examination of the computational complexity of unification .
EDU 25: we have shown
EDU 26: that the consistency problem for formulas with disjunctive values is np-complete .
EDU 27: to deal with this complexity ,
EDU 28: we describe
EDU 29: how disjunctive values can be specified in a way
EDU 30: which delays expansion
EDU 31: to disjunctive normal form .
EDU 0:
EDU 1: in this paper , we show
EDU 2: that higher-order coloured unification - a form of unification
EDU 3: developed for automated theorem proving -
EDU 4: provides a general theory
EDU 5: for modeling the interface between the interpretation process and other sources of linguistic , non semantic information .
EDU 6: in particular , it provides the general theory for the primary occurrence restriction
EDU 7: which ( dalrymple et al. , 0000 ) 's analysis called for .
EDU 0:
EDU 1: a natural next step in the evolution of constraint-based grammar formalisms from rewriting formalisms is to abstract fully away from the details of the grammar mechanism
EDU 2: --to express syntactic theories purely in terms of the properties of the class of structures
EDU 3: they license .
EDU 4: by focusing on the structural properties of languages rather than on mechanisms
EDU 5: for generating or checking structures
EDU 6: that exhibit those properties ,
EDU 7: this model-theoretic approach can offer simpler and significantly clearer expression of theories
EDU 8: and can potentially provide a uniform formalization ,
EDU 9: allowing disparate theories to be compared on the basis of those properties .
EDU 10: we discuss l0 , p , a monadic second-order logical framework for such an approach to syntax
EDU 11: that has the distinctive virtue of being superficially expressive -
EDU 12: supporting direct statement of most linguistically significant syntactic properties -
EDU 13: but having well-defined strong generative capacity -
EDU 14: languages are definable in l0k , p iff
EDU 15: they are strongly context-free .
EDU 16: we draw examples from the realms of gpsg and gb .
EDU 0:
EDU 1: information retrieval is an important application area of natural-language processing
EDU 2: where one encounters the genuine challenge
EDU 3: of processing large quantities of unrestricted natural-language text .
EDU 4: this paper reports on the application of a few simple , yet robust and efficient nounphrase analysis techniques
EDU 5: to create better indexing phrases for information retrieval.
EDU 6: in particular , we describe a hybrid approach to the extraction of meaningful ( continuous or discontinuous ) subcompounds from complex noun phrases
EDU 7: using both corpus statistics and linguistic heuristics .
EDU 8: results of experiments show
EDU 9: that indexing based on such extracted subcompounds improves both recall and precision in an information retrieval system .
EDU 10: the noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction .
EDU 0:
EDU 1: most natural language processing tasks require lexical semantic information .
EDU 2: automated acquisition of this information would thus increase the robustness and portability of nlp systems .
EDU 3: this paper describes an acquisition method
EDU 4: which makes use of fixed correspondences between derivational affixes and lexical semantic information .
EDU 5: one advantage of this method , and of other methods
EDU 6: that rely only on surface characteristics of language ,
EDU 7: is that the necessary input is currently available .
EDU 0:
EDU 1: this paper deals with the discovery , representation , and use of lexical rules ( lrs ) during large-scale semi-automatic computational lexicon acquisition .
EDU 2: the analysis is based on a set of lrs
EDU 3: implemented and tested on the basis of spanish and english business- and finance-related corpora .
EDU 4: we show that ,
EDU 5: though the use of lrs is justified ,
EDU 6: they do not come costfree .
EDU 7: semi-automatic output checking is required , even with blocking and preemtion procedures
EDU 8: built in .
EDU 9: nevertheless , largescope lrs are justified
EDU 10: because they facilitate the unavoidable process of large-scale semi-automatic lexical acquisition .
EDU 11: we also argue
EDU 12: that the place of lrs in the computational process is a complex issue .
EDU 0:
EDU 1: in this paper , we present a new approach for word sense disambiguation ( wsd )
EDU 2: using an exemplar-based learning algorithm .
EDU 3: this approach integrates a diverse set of knowledge sources
EDU 4: to disambiguate word sense ,
EDU 5: including part of speech of neighboring words , morphological form , the unordered set of surrounding words , local collocations , and verb-object syntactic relation .
EDU 6: we tested our wsd program ,
EDU 7: named lexas ,
EDU 8: on both a common data set
EDU 9: used in previous work ,
EDU 10: as well as on a large sense-tagged corpus
EDU 11: that we separately constructed .
EDU 12: lexas achieves a higher accuracy on the common data set ,
EDU 13: and performs better than the most frequent heuristic on the highly ambiguous words in the large corpus
EDU 14: tagged with the refined senses of wordnet .
EDU 0:
EDU 1: we present an overview of recent work
EDU 2: in which eye movements are monitored
EDU 3: as people follow spoken instructions
EDU 4: to move objects or pictures in a visual workspace .
EDU 5: subjects naturally make saccadic eye-movements to objects
EDU 6: that are closely time-locked to relevant information in the instruction .
EDU 7: thus the eye-movements provide a window into the rapid mental processes
EDU 8: that underlie spoken language comprehension .
EDU 9: we review studies of reference resolution , word recognition , and pragmatic effects on syntactic ambiguity resolution .
EDU 10: our studies show
EDU 11: that people seek to establish reference with respect to their behavioral goals during the earliest moments of linguistic processing .
EDU 12: moreover , referentially relevant non-linguistic information immediately affects how the linguistic input is initially structured .
EDU 0:
EDU 1: we present a natural language interface system
EDU 2: which is based entirely on trained statistical models .
EDU 3: the system consists of three stages of processing :
EDU 4: parsing , semantic interpretation , and discourse .
EDU 5: each of these stages is modeled as a statistical process .
EDU 6: the models are fully integrated ,
EDU 7: resulting in an end-to-end system
EDU 8: that maps input utterances into meaning representation frames .
EDU 0:
EDU 1: this paper describes a system
EDU 2: that leads us to believe in the feasibility
EDU 3: of constructing natural spoken dialogue systems in task-oriented domains .
EDU 4: it specifically addresses the issue of robust interpretation of speech in the presence of recognition errors .
EDU 5: robustness is achieved by a combination of statistical error post-correction , syntactically- and semantically-driven robust parsing , and extensive use of the dialogue context .
EDU 6: we present an evaluation of the system
EDU 7: using time-to-completion and the quality of the final solution
EDU 8: that suggests
EDU 9: that most native speakers of english can use the system successfully with virtually no training .
EDU 0:
EDU 1: this paper addresses the problem
EDU 2: of correcting spelling errors
EDU 3: that result in valid , though unintended words
EDU 4: ( such as peace and piece , or quiet and quite )
EDU 5: and also the problem
EDU 6: of correcting particular word usage errors
EDU 7: ( such as amount and number , or among and between ) .
EDU 8: such corrections require contextual information
EDU 9: and are not handled by conventional spelling programs
EDU 10: such as unix spell .
EDU 11: first , we introduce a method
EDU 12: called trigrams
EDU 13: that uses part-of-speech trigrams
EDU 14: to encode the context .
EDU 15: this method uses a small number of parameters
EDU 16: compared to previous methods
EDU 17: based on word trigrams .
EDU 18: however , it is effectively unable to distinguish among words
EDU 19: that have the same part of speech .
EDU 20: for this case , an alternative feature-based method
EDU 21: called bayes performs better ;
EDU 22: but bayes is less effective than trigrams
EDU 23: when the distinction among words depends on syntactic constraints .
EDU 24: a hybrid method
EDU 25: called tribayes
EDU 26: is then introduced
EDU 27: that combines the best of the previous two methods .
EDU 28: the improvement in performance of tribayes over its components is verified experimentally .
EDU 29: tribayes is also compared with the grammar checker in microsoft word ,
EDU 30: and is found to have substantially higher performance .
EDU 0:
EDU 1: under categorial grammars
EDU 2: that have powerful rules like composition ,
EDU 3: a simple n-word sentence can have exponentially many parses .
EDU 4: generating all parses is inefficient
EDU 5: and obscures whatever true semantic ambiguities are in the input .
EDU 6: this paper addresses the problem for a fairly general form of combinatory categorial grammar ,
EDU 7: by means of an efficient , correct , and easy to implement normal-form parsing technique .
EDU 8: the parser is proved to find exactly one parse in each semantic equivalence class of allowable parses ;
EDU 9: that is , spurious ambiguity
EDU 10: ( as carefully defined )
EDU 11: is shown to be both safely and completely eliminated .
EDU 0:
EDU 1: in this paper we present a new parsing algorithm for linear indexed grammars ( ligs ) in the same spirit as the one
EDU 2: described in ( vijay-shanker and weir , 0000 ) for tree adjoining grammars .
EDU 3: for a lig l and an input string x of length n , we build a non ambiguous context-free grammar
EDU 4: whose sentences are all ( and exclusively ) valid derivation sequences in l
EDU 5: which lead to x .
EDU 6: we show
EDU 7: that this grammar can be built in ( o ( n^0 ) ) time
EDU 8: and that individual parses can be extracted in linear time with the size of the extracted parse tree .
EDU 9: though this o ( n^0 ) upper bound does not improve over previous results ,
EDU 10: the average case behaves much better .
EDU 11: moreover , practical parsing times can be decreased by some statically performed computations .
EDU 0:
EDU 1: we study the computational complexity of the parsing problem of a variant of lambek categorial grammar
EDU 2: that we call semidirectional.
EDU 3: in semidirectional lambek calculus sd-lsb- there is an additional nondirectional abstraction rule
EDU 4: allowing the formula
EDU 5: abstracted over
EDU 6: to appear anywhere in the premise sequent 's left-hand side ,
EDU 7: thus permitting non-peripheral extraction .
EDU 8: sd-lsb- grammars are able to generate each context-free language and more than that .
EDU 9: we show
EDU 10: that the parsing problem for semidireetional lambek grammar is np-complete by a reduction of the 0-partition problem .
EDU 0:
EDU 1: this paper describes an algorithm
EDU 2: for computing optimal structural descriptions for optimality theory grammars with context-free position structures .
EDU 3: this algorithm extends tesar 's dynamic programming approach ( tesar , 0000 ) ( tesar , 0000 )
EDU 4: to computing optimal structural descriptions from regular to context-free structures .
EDU 5: the generalization to context free structures creates several complications ,
EDU 6: all of which are overcome
EDU 7: without compromising the core dynamic programming approach .
EDU 8: the resulting algorithm has a time complexity cubic in the length of the input ,
EDU 9: and is applicable to grammars with universal constraints
EDU 10: that exhibit context-free locality .
EDU 0:
EDU 1: this paper introduces to the finite-state calculus a family of directed replace operators .
EDU 2: in contrast to the simple replace expression , upper -> lower ,
EDU 5: if the lower language consists of a single string .
EDU 6: it transduces the input string from left to right ,
EDU 7: making only the longest possible replacement at each point .
EDU 8: a new type of replacement expression , upper @ -> prefix ... suffix , yields a transducer
EDU 9: that inserts text around strings
EDU 10: that are instances of upper .
EDU 11: the symbol ... denotes the matching part of the input
EDU 12: which itself remains unchanged .
EDU 13: prefix and suffix are regular expressions
EDU 14: describing the insertions .
EDU 15: expressions of the type upper @ -> prefix ... suffix may be used
EDU 16: to compose a deterministic parser for a `` local grammar '' in the sense of gross ( 0000 ) .
EDU 17: other useful applications of directed replacement include tokenization and filtering of text streams .
EDU 0:
EDU 1: in synchronous rewriting , the productions of two rewriting systems are paired and applied synchronously in the derivation of a pair of strings .
EDU 2: we present a new synchronous rewriting system
EDU 3: and argue
EDU 4: that it can handle certain phenomena
EDU 5: that are not covered by existing synchronous systems .
EDU 6: we also prove some interesting formal/computational properties of our system .